System Overview

This chapter is a 30,000-foot view of how a build request travels from your keyboard (or browser) to a compiled .bin and back. The next three chapters zoom in on each layer.

The cast

Component	Where it runs	What it does
Your client	Your laptop	Either an MCP-capable AI client (Claude Code, Cursor, …) or the esphome.cloud browser wizard.
`espctl mcp serve`	Your laptop or the build machine	The MCP server. Translates MCP tool calls into local plans, or remote build requests over WebRTC.
Build server	A public Linux host	Issues build permissions, brokers connection setup, assigns jobs to build machines. Never sees the build itself.
Build machine	A private Linux host with the ESP-IDF toolchain	Runs the actual build inside a sandbox. Communicates with your client over WebRTC data channels.
The store	Disk on the build machine host	The on-disk directory containing all installed IDF versions and toolchains.

How a build flows

┌────────────────┐
│  Your client   │
│ (IDE or browser)│
└────┬───────────┘
     │ ① "Build me an esp32s3 firmware"
     ▼
┌────────────────┐    ② permission request       ┌─────────────────┐
│ MCP server     │───────────────────────────►│ Build server    │
│ (espctl)       │◄───────────────────────────│ - issues permit │
└────┬───────────┘    ③ permit + ICE servers  │ - picks machine │
     │                                        └─────────┬───────┘
     │ ④ SDP offer (POST /signaling/.../offer)         │
     │                                                  │ ⑤ live
     │                                                  ▼ updates
     │                                            ┌─────────────┐
     │                                            │Build machine│
     │                                            └────┬────────┘
     │ ⑥ WebRTC peer connection                       │
     │ ◄──────────────────────────────────────────────►
     │
     │ ⑦ BuildRequest on espctl channel
     │ ⑧ Logs streaming on pty channel
     │ ⑨ Firmware bytes on firmware channel
     ▼
┌────────────────┐
│ Result: .bin   │
│ + size report  │
│ + manifest     │
└────────────────┘

The numbered steps:

You ask your AI client (or click a button in the browser wizard).
The MCP server (or the browser) POSTs a permission request to the build server: “I want a build session, with channels espctl, pty, firmware, for up to 30 seconds.” (Long-lived sessions get separate, longer-lived permissions.)
The build server returns a signed permission token, picks one or more candidate build machines that can run the job, and includes a list of fallback relay servers for the WebRTC handshake.
The MCP server posts an SDP offer to the connection setup endpoint.
The chosen build machine – which is checking regularly with the build server for new jobs – picks up the permission via live updates and prepares to receive the offer.
WebRTC connection negotiation happens. The two sides exchange candidates through the build server (which acts purely as a relay; it never sees the contents of the SDP body) and converge on either a direct peer-to-peer connection or one routed through a fallback relay server.
With the data channels open, the client sends a BuildRequest on the espctl channel. The build machine verifies the permission token signature locally and starts the build.
As the build runs, the build machine streams idf.py stdout/stderr back on the pty channel and structured pipeline events on the espctl channel.
When the build finishes successfully, the build machine chunks the firmware binary and streams it back on the firmware channel, with a final SHA-256 for verification.

The build itself runs inside a sandbox on the build machine – it cannot read or write anything outside the workspace directory the build machine set up for it.

Three layers, three responsibilities

This is the three-layer model the rest of the architecture chapters expand on:

Build Server & Connection Setup – public, stateless, knows about who but not what. Issues build permissions. Relays connection setup messages. Never has the authority to decrypt anything.
WebRTC Build Machine & Data Channels – private, runs the build, enforces channel whitelists and bandwidth limits client-by-client. Has full code execution authority but only inside the sandbox.
Permissions & Security – the signing protocol that ties the two together. A build permission is a signed token saying “this user gets these channels for this long”.

The architecture is structured to keep the public surface (the build server) stateless and untrusted. Compromising the build server gets you the ability to issue permissions, but a permission is useless without a build machine willing to honor it – and the build machine verifies permission signatures locally using a public key it trusts at compile time.

Conversely, compromising a build machine gets you whatever code is running in the sandbox right now, but the build machine cannot impersonate other users or issue permissions. Build machines are essentially “computers that run untrusted code inside a sandbox”, which is exactly the threat model sandboxes were built for.

The data channels themselves are direct peer-to-peer when possible, so build logs and firmware binaries don’t transit the build server. This means whoever runs the build server still cannot read your build logs or your firmware images even if they wanted to.

ESPCTL MCP — User Manual

System Overview

The cast

How a build flows

Three layers, three responsibilities

Why this shape?

Where to read next

Keyboard shortcuts

ESPCTL MCP — User Manual

System Overview

The cast

How a build flows

Three layers, three responsibilities

Why this shape?

Where to read next