Building infrastructure with an AI coding agent: the safety model

Letting an AI agent touch real infrastructure sounds risky because it is risky if the agent is just a chatbot with shell access.

The useful version is narrower. The agent needs a tool surface it can understand, a policy for when to stop, and enough account context to avoid inventing IDs. That is the safety model behind our Excloud skills and the pi plugin path.

Start read-only

The first phase should not change anything.

An agent can safely run commands like:

exc compute subnet list
exc compute image list
exc compute instancetype list
exc compute key list
exc securitygroup list

Those calls answer basic questions: what subnet exists, which images are available, what instance types can be used, whether the user already has an SSH key, and what security groups are already present.

Plan before spend

After discovery, the agent should produce a plan before it creates billable resources.

A good plan names:

The resources it will create.
The instance types and disk sizes.
Whether public IPv4 will be allocated.
Which security group rules will exist.
The estimated hourly cost.
The cleanup path.

That lets the human approve the shape, not just the vibe.

Cost belongs in the plan

The agent should not say “this may cost money” and move on. It should multiply the actual rate card by the chosen resources.

For example, if it plans one t1a.medium, one m1a.large, and one public IPv4, it can show the hourly total before running compute create.

That changes the interaction. The approval is not “trust me”; it is “this is the command set and this is the meter.”

Confirmation gates

The safety model has two gates:

Billable creation needs approval.
Destructive actions need approval.

Destructive means more than VM termination. It includes volume deletion, snapshot deletion, public IP release, security group deletion, rule deletion, Kubernetes cluster deletion, API key deletion, and policy changes.

The agent can recommend cleanup. It should not perform cleanup silently.

Why pi matters here

We built the workflow for coding agents generally, but pi is a first-class path for this model. The pi plugin/integration gives pi the same Excloud-specific operating rules: inspect first, plan, confirm before spend, confirm before destruction, and show commands as it goes.

That is the point. The agent should not be special because it has hidden access. It should be useful because it drives the same CLI a human can drive, with a stricter habit of stopping at the dangerous parts.

The result

The ideal session is simple:

User: create a public web VM and a private database VM
Agent: I will inspect subnets, images, keys, and prices first
Agent: here is the plan and hourly cost
User: go
Agent: runs explicit commands
Agent: here are the IDs, IPs, and cleanup command
Agent: want me to tear it down?

That is the difference between an infrastructure assistant and an infrastructure liability.