The Context Window Problem

▶ Watch (2:45)

Cloudflare’s API defines everything the platform can do. The OpenAPI spec is 2.3 million tokens. Loading all endpoints as tools produces definitions around 1.1 million tokens. No model can handle that. Splitting the API into multiple MCP servers by feature gave incomplete coverage but less context per server. The problem is not MCP itself. Clients break when given 10,000 tools. Progressive disclosure became the goal.

Three Approaches to Progressive Disclosure

▶ Watch (4:35)

Carey demoed a project management API with 20 endpoints. Three strategies reduce context. CLI: the agent runs bash commands. Individual commands load at runtime. Tool search: semantic search picks the top 3 to 5 tools and loads them. Works for small tool sets. Code mode: the agent writes JavaScript against a typed SDK and executes it in a sandbox. This gives composability like bash with return types. The demo created projects, sprints, and tasks by chaining function calls.

Safe Sandbox with V8 Dynamic Workers

▶ Watch (7:38)

Running untrusted agent code is risky. DSLs, Docker containers, VMs, and microVMs are too heavy for small snippets. Carey argued V8 is the right sandbox. Cloudflare’s dynamic workers create a worker from a string with 1 millisecond cold start. That worker has no file system access, no env variables, and no global network fetch by default. The sandbox can be configured to allow specific outgoing fetches. All demos executed code this way, sending code to an external service that dispatched a dynamic worker.

Code Mode over MCP

▶ Watch (14:50)

After seven months, few MCP clients have built-in code execution. Building a client is hard. Cloudflare shifted the execution into the MCP server itself. The server runs untrusted agent code against the underlying API. That reduces the whole Cloudflare API to a single MCP server using just 1,000 tokens of context at rest. Each capability is progressively disclosed as needed. Carey called it a “hack” until all agents have built-in code interpreters, but it works now.

Q&A

Are you using V8 or optimizing it for your runtime? Dynamic workers run on workerd, an open-source runtime with heavily customized V8. ▶ Watch (19:49)

Does the MCP server expose only two tools to the client? Yes, search and execute; the agent writes code in search to find endpoints and in execute to call them. ▶ Watch (20:23)

Why not use a sub-agent to navigate the full API instead of code mode? 2.3 million tokens exceeds typical context windows; sub-agents are heavy and not scalable. ▶ Watch (23:21)

Notable Quotes

our open API spec is 2.3 million tokens Matt Carey · ▶ Watch (2:45)

this is a problem for the agents actually Matt Carey · ▶ Watch (3:40)

V8 is the right sandbox for this Matt Carey · ▶ Watch (10:56)

we didn’t have the right infrastructure primitives to run untrusted code and now we do Matt Carey · ▶ Watch (15:29)

building an MCP client is actually really really tough Matt Carey · ▶ Watch (16:28)

Key Takeaways

  • Progressive disclosure via code mode reduces a 2.3-million-token API to 1,000 tokens of context at rest.
  • V8 dynamic workers provide a lightweight sandbox for executing untrusted agent code with 1ms cold start.
  • Until MCP clients support built-in code execution, server-side code mode bridges the gap.

About the Speaker(s)

Matt Carey works on Agents and MCP at Cloudflare and maintains the official MCP TypeScript SDK. He builds infrastructure for agent developers. He is currently releasing v2 of the SDK. Before Cloudflare, he was a professional windsurfer and represented Malta at world and European championships.