Distributed agents over A2A — Experience

Experimental transport

Distributed agent execution rides on the A2A transport, which is built on a -preview package line and gated behind a runtime flag. When the flag is off (in-api), agents run in-process exactly as they always have. Nothing in the experience below changes between the two modes — that invisibility is the whole point — but operators should know the path is preview-staged and instantly reversible.

This doc describes what it feels like to run agents distributed across sandbox pods over A2A — for the person watching a run and for the operator running the platform. The short version: it feels like nothing changed. The transport is invisible by design. What changes is operational, not experiential.

For the design, read the A2A bridge deep dive. For the surface and security gates, read the A2A reference. For the pod lifecycle itself, see Sandbox pod execution and its experience doc.

1. The headline: runs behave identically to in-process

A run that executes its agent turns inside pods produces the same timeline, in the same order, with the same review gates as a run that executes in-process. From the user's seat:

The run timeline streams the same events, token-by-token, with no new "remote" markers and no gaps.
Review and confirmation gates appear at exactly the same points and behave identically — they pause the run, wait for a human, and resume.
A coordinator orchestration drafts the same OutcomeSpec, suspends at the same confirmation gate, and dispatches the same child runs.

There is no "distributed mode" toggle in the UI, no different progress bar, no transport indicator. A user cannot tell from the experience whether a turn ran in the worker process or in a pod across the cluster. That is the design goal, not a happy accident.

1. The headline: runs behave identically to in-process: User / operator, Web UI run timeline, Same SSE event stream, Same events,, Turn ran in worker, Turn ran in a pod

2. Why the transport is invisible

The reason the experience is identical is structural, and it is worth understanding even from the user's side. Only the leaf agent turn moves into a pod. The orchestration graph — the part that raises timeline events and runs the review gates — stays in the worker. So:

The events you watch are produced by the same graph in the same place, whether or not the leaf turn is remote.
The review/confirm gates are graph constructs that live in the worker; they never travel over the wire, so they behave exactly as before.
The pod streams the turn's output back, and the worker re-injects it into the same event stream that feeds the browser.

In other words, the only thing that relocated is the heavy model work. The narrative of the run — its events, its gates, its ordering — never left home. See the coordinator orchestration experience for that narrative in detail; distributed execution does not alter it.

3. What actually changes — and it is operational

The differences are all on the operations side.

Aspect	In-process (`in-api`)	Distributed (`pod-per-run`)
Where a turn runs	In the worker process	In a warm AgentHost sandbox pod configured for the run
Isolation	Shared worker process	Kata-isolated pod, scoped credential, default-deny egress
Memory footprint	Worker holds every active session	Heavy SDK session lives and dies in the pod
Failure blast radius	A bad turn can pressure the worker	A bad turn is contained to its pod
What an operator watches	Worker pods	Worker pods plus sandbox pods

The operational wins are isolation and memory relief: the heavyweight model session leaves the worker process and runs in a disposable, isolated pod. A run no longer keeps a heavy session pinned in a shared process, which is the memory-pressure fix. And a misbehaving turn is contained inside its own Kata-isolated pod rather than sharing the worker's address space.

4. How to reason about it as an operator

A few mental models keep distributed execution easy to reason about.

The pod is disposable; the run is durable. Pods come and go. Durable resume does not live in the pod or in the A2A connection — it lives in the worker's checkpoint state (the worker's CheckpointManager plus the serialized session blob; the pod holds no database connection). When a run suspends on a review gate or a coordinator idles, the per-run pod can be checkpointed and released (RunWatchLoopService calls ReleaseAgentHostPodAsync when Sandbox:ReleasePodOnSuspend=true), and a fresh per-run pod is launched and rehydrated on resume. A user watching the run sees a normal pause at a gate, not a pod lifecycle event.

A dropped connection re-drives a turn, it does not lose it. A2A's live stream has no mid-stream replay. If a pod or its connection drops mid-turn, the worker re-drives that turn from the last checkpoint. To a watcher this looks like the turn continuing; the timeline does not duplicate, because re-injection is idempotent. There is no manual recovery step for the common case.

The transport is the sole wire, and the rollback is a flag. A2A is the only wire transport for agent turns. If anything goes wrong with it, the rollback is not "switch to another protocol" — it is Sandbox:AgentExecutionMode=in-api, which reverts to in-process execution instantly. Operators reason about one transport and one flag, not a matrix of fallback protocols.

Every turn is authenticated to that run's pod. The worker path is API/worker → claim warm pod → POST /configure → RemoteAgentProxy → Authorization: Bearer {per-run token} → AgentHost message:stream. The token is generated at run launch, delivered by /configure, and accepted only by that pod. NetworkPolicy and mTLS still restrict who can reach the listener, but the turn endpoint also has application-layer bearer auth.

More pods to watch, same run model. The new operational surface is sandbox pods alongside worker pods. Their warm-pool sizing, isolation, and credential model are covered in sandbox pods reference. The run timeline, review gates, and event stream you already know are unchanged.

4. How to reason about it as an operator: Agent turn starts, Claim warm AgentHost pod, POST /configure, Stream turn over A2A, Turn output committed, Checkpoint + release pod, Re-launch per-run pod, rehydrate, Re-drive from last checkpoint

5. Where you see it: Web UI, MCP, and diagnostics

Web UI. The run and board views are unchanged. The timeline, token streaming, and review/merge surfaces look and behave the same in both execution modes. There is no transport widget to learn.
MCP. The MCP tool surface that drives and observes runs is unchanged — starting, confirming, watching, steering, and reviewing a run work identically whether turns are in-process or distributed. (The MCP server itself is a separate inbound surface; see the MCP server deep dive and MCP client experience.)
Diagnostics. Where distributed execution does become visible is in operational diagnostics: pod counts, warm-pool state (including two pre-warmed AgentHost pods), per-run pod claims, and the execution-mode flag. These are operator-facing signals, not user-facing run state. East-west connectivity and the bridge's health belong to agent communication.

6. The one caveat to keep in mind

The transport is preview-staged. The experience is designed to be identical and the path is instantly reversible via the in-api flag, but the underlying A2A package line is -preview and pinned by version+hash until it reaches GA. For most users this is invisible; for operators it is the reason in-api stays the default until the distributed path completes soak. The honest framing — strong isolation and memory relief, bought with a pinned, flag-gated preview dependency and a one-flag rollback — is detailed in the A2A reference and the A2A bridge deep dive.

Distributed agents over A2A — Experience ​

1. The headline: runs behave identically to in-process ​

2. Why the transport is invisible ​

3. What actually changes — and it is operational ​

4. How to reason about it as an operator ​

5. Where you see it: Web UI, MCP, and diagnostics ​