We built our cloud on bare metal in Mumbai. Here's what we learned.

Many regional clouds start as a reseller agreement and a logo. Excloud took a different path: we operate the hardware, network, control plane, CLI, and console ourselves.

That choice shapes the product. Capacity planning, routing, storage behavior, API design, and support all sit inside the same operating loop instead of being passed through another provider.

This is the honest version of what that took, including the parts that didn’t go to plan.

Owning the metal

We buy and operate the machines ourselves rather than reselling another provider’s VMs. The fleet is AMD EPYC with NVMe across the board and high-speed networking on every compute size. That is the important distinction: customers are running on infrastructure Excloud operates directly.

The operating model is unglamorous: hardware, power, space, network, and the people to run it. Owning that layer lets us make precise decisions about instance shapes, storage behavior, network policy, and support instead of inheriting another platform’s defaults.

The flip side, which nobody warns you about: you now own capacity planning. There’s no infinite pool to autoscale into. When demand spikes, the answer is a purchase order and a lead time, not an API call. We’ve gotten good at forecasting and at keeping headroom, but it is a real constraint. Our current public region and zone are documented in the compute docs; we do not claim multi-region or multi-AZ coverage we do not have.

Running our own network

Network design affects the way a cloud feels in production. Latency, routing hygiene, abuse handling, and data transfer policy are not afterthoughts; they are part of the product.

Excloud is AS152129, with its own address space, peering at Indian internet exchanges and over private interconnects. When a customer in India pulls data off one of our VMs, that traffic can hand off to their network a short hop away. That is the practical reason we invest in peering, route hygiene, and direct network operations.

Running your own ASN comes with responsibilities you don’t get to opt out of. We’re a MANRS member, which means we filter our route announcements, prevent leaks, and keep our routing data accurate — the boring hygiene that keeps the internet from melting. We’re listed on PeeringDB, we support customers who bring their own IP space, and we encourage those customers onto MANRS too. None of this shows up on a marketing page well, but it’s the difference between a real network and a NAT box with ambitions.

Writing the control plane ourselves

We wrote the orchestration, the API, the CLI (exc), and the console in-house. The temptation early on is to bolt a UI onto OpenStack and call it a day. We didn’t, because the abstractions you inherit from a general-purpose stack are the abstractions you’re then stuck explaining to customers forever.

A concrete payoff: the exc CLI is generated from a live OpenAPI surface, so the commands and flags track the API as it changes instead of drifting out of date. That same surface is what lets an AI coding agent drive the whole platform in plain English, which is a story for another post. It only works because we control the interface end to end.

The cost of building it yourself is that every primitive is a decision you have to make and then defend. Block storage is the clearest example. We meter IOPS and throughput separately from capacity so teams can choose the shape they need. It is more to explain, and we have to write docs and examples so nobody gets a surprise. Owning the stack means owning that explanation.

The things we got wrong

Two worth admitting.

The speed claim. For a while the whole site led with a provisioning-speed headline. It was catchy and it was the kind of number you can’t honestly stand behind across every case, so we retired it. The site now leads with the platform itself: compute, storage, networking, docs, and support we can keep improving in public. If you ever see us quote a speed with a number, hold us to it.

Port 25. Early on we let outbound SMTP through, like the open internet of 2005. Our address space started picking up spam reputation damage from a handful of abusive instances, which is exactly how a young network gets its IPs blacklisted and ruins deliverability for every honest customer on the same ranges. We now block port 25 from egressing to the internet while leaving it open inside your private subnet for testing. It’s a small thing that we learned the hard way, and we wrote it up because every cloud eventually makes this call and few explain why.

Why we’d do it again

Building on bare metal is slower and less forgiving than reselling. You carry hardware lead times, network operations, and a control plane that’s entirely your problem at 3 a.m.

But the thing we set out to prove is operational, not comparative: a cloud platform can be built close to the hardware, with its own network, its own control plane, and a support loop run by the people operating it. That is the company.

If you want to see it directly, start a box, read the docs, and tell us where the platform should be sharper.