System Overview
This chapter is a 30,000-foot view of how a build request travels from your
keyboard (or browser) to a compiled .bin and back. The next three chapters
zoom in on each layer.
The cast
| Component | Where it runs | What it does |
|---|---|---|
| Your client | Your laptop | Either an MCP-capable AI client (Claude Code, Cursor, …) or the esphome.cloud browser wizard. |
espctl mcp serve | Your laptop or the build machine | The MCP server. Translates MCP tool calls into local plans, or remote build requests over WebRTC. |
| Build server | A public Linux host | Issues build permissions, brokers connection setup, assigns jobs to build machines. Never sees the build itself. |
| Build machine | A private Linux host with the IDF toolchain | Runs the actual build inside a sandbox. Communicates with your client over WebRTC data channels. |
| The store | Disk on the build machine host | The on-disk directory containing all installed IDF versions and toolchains. |
How a build flows
┌────────────────┐
│ Your client │
│ (IDE or browser)│
└────┬───────────┘
│ ① "Build me an esp32s3 firmware"
▼
┌────────────────┐ ② permission request ┌─────────────────┐
│ MCP server │───────────────────────────►│ Build server │
│ (espctl) │◄───────────────────────────│ - issues permit │
└────┬───────────┘ ③ permit + ICE servers │ - picks machine │
│ └─────────┬───────┘
│ ④ SDP offer (POST /signaling/.../offer) │
│ │ ⑤ live
│ ▼ updates
│ ┌─────────────┐
│ │Build machine│
│ └────┬────────┘
│ ⑥ WebRTC peer connection │
│ ◄──────────────────────────────────────────────►
│
│ ⑦ BuildRequest on espctl channel
│ ⑧ Logs streaming on pty channel
│ ⑨ Firmware bytes on firmware channel
▼
┌────────────────┐
│ Result: .bin │
│ + size report │
│ + manifest │
└────────────────┘
The numbered steps:
- You ask your AI client (or click a button in the browser wizard).
- The MCP server (or the browser) POSTs a permission request to the
build server: “I want a build session, with channels
espctl,pty,firmware, for up to 30 seconds.” (Long-lived sessions get separate, longer-lived permissions.) - The build server returns a signed permission token, picks one or more candidate build machines that can run the job, and includes a list of fallback relay servers for the WebRTC handshake.
- The MCP server posts an SDP offer to the connection setup endpoint.
- The chosen build machine – which is checking regularly with the build server for new jobs – picks up the permission via live updates and prepares to receive the offer.
- WebRTC connection negotiation happens. The two sides exchange candidates through the build server (which acts purely as a relay; it never sees the contents of the SDP body) and converge on either a direct peer-to-peer connection or one routed through a fallback relay server.
- With the data channels open, the client sends a
BuildRequeston theespctlchannel. The build machine verifies the permission token signature locally and starts the build. - As the build runs, the build machine streams
idf.pystdout/stderr back on theptychannel and structured pipeline events on theespctlchannel. - When the build finishes successfully, the build machine chunks the
firmware binary and streams it back on the
firmwarechannel, with a final SHA-256 for verification.
The build itself runs inside a sandbox on the build machine – it cannot read or write anything outside the workspace directory the build machine set up for it.
Three layers, three responsibilities
This is the three-layer model the rest of the architecture chapters expand on:
- Build Server & Connection Setup – public, stateless, knows about who but not what. Issues build permissions. Relays connection setup messages. Never has the authority to decrypt anything.
- WebRTC Build Machine & Data Channels – private, runs the build, enforces channel whitelists and bandwidth limits client-by-client. Has full code execution authority but only inside the sandbox.
- Permissions & Security – the signing protocol that ties the two together. A build permission is a signed token saying “this user gets these channels for this long”.
Why this shape?
The architecture is structured to keep the public surface (the build server) stateless and untrusted. Compromising the build server gets you the ability to issue permissions, but a permission is useless without a build machine willing to honor it – and the build machine verifies permission signatures locally using a public key it trusts at compile time.
Conversely, compromising a build machine gets you whatever code is running in the sandbox right now, but the build machine cannot impersonate other users or issue permissions. Build machines are essentially “computers that run untrusted code inside a sandbox”, which is exactly the threat model sandboxes were built for.
The data channels themselves are direct peer-to-peer when possible, so build logs and firmware binaries don’t transit the build server. This means the operator running the build server cannot read your build logs or your firmware images even if they wanted to.