Three Primitives for the Missing Middle

▶ Watch (0:55)

Kirschner defined the “missing middle” as the gap between a chat interface and a tool call. MCP fills it with three primitives. Structured input lets servers pause a tool call and ask clarifying questions. Life status tracks progress on tasks that run for minutes or weeks. MCP apps render interactive output users can drill into, sort, or drag. “It’s all about that in the middle and seeing progress and understanding what’s happening,” Kirschner said.

URL Elicitations Hand Off Control to the Server

▶ Watch (2:53)

Form elicitations landed first, letting servers define a JSON schema for direct human input. URL elicitations go further. They move the input step to a remote server, bypassing the agent entirely. Peet demoed loading hotel telemetry data into VS Code. The tool call triggered a URL elicitation. The spec requires the client to display the URL so users verify they are not being tricked by a malicious redirect. Common use cases include OAuth flows, cloud instance configuration, and confirmation wizards.

Tasks Keep Long-Running Work Alive

▶ Watch (8:09)

A tool call annotated as a task returns a task result instead of a tool result. The client tracks it and can reconnect after a network interruption. “Minutes, hours, or even weeks of time can be represented in this way,” Peet said. Tasks are experimental. Kirschner expects richer UIs for progress and input prompts as the spec evolves. The primitive also targets agentic orchestration systems, not just local clients.

MCP Apps Render Interactive Output Before the Tool Finishes

▶ Watch (12:55)

MCP apps use a sandboxed iframe and bidirectional postMessage. Peet showed a dashboard of his telemetry data with a histogram of Opus calls. The app displayed before the tool call completed. A button in the app triggered a second tool call that resolved the original one. “You can use this as a way to do a rich elicitation using your custom UI,” Peet said. Storybook, Figma, Amplitude, and tldraw have adopted MCP apps.

Sampling Saves 30k Tokens Per Call

▶ Watch (16:07)

When a tool returns large JSON output, VS Code writes it to a gzipped file and generates the JSON schema. The model reads only the first few hundred lines. Peet’s deep analysis used 20% of a 200k token window. “It probably saved at least 30k tokens,” he said. Sampling lets the server ask the client to run an inference call on its behalf. Kirschner called it a superpower: servers can summarize data before passing it to the agent, reducing context pollution without needing an external API key.

Tool Annotations, Server Instructions, and Dynamic Tools

▶ Watch (23:02)

Tool annotations mark tools as read-only, destructive, idempotent, or open-world. Clients use these hints to warn users about side effects or auto-run read-only tools. Server instructions inject a prompt into the system prompt to define the server’s contract. Dynamic tool updates let servers add tools progressively. Playwright’s “open browser” tool appears first; “click button” appears only after the browser is open. This reduces context window overhead.

Notable Quotes

“The missing middle isn’t static as the protocols. It’s more and more is possible. So don’t just do tool calls but think about the user experience of how your agents collaborate.” Harald Kirschner · ▶ Watch (26:02)

“Sampling is a superpower in the agent to potentially even run its own agentic loops without depending on another service having an API key sitting around somewhere but using the LMs that already exists in the client.” Harald Kirschner · ▶ Watch (11:17)

“I spend way more time with agents now in planning and review mode. So having those happening more visual is really a key part of my experience.” Harald Kirschner · ▶ Watch (20:36)

Key Takeaways

  • URL elicitations hand off OAuth and configuration flows directly to the server, bypassing the agent.
  • Tasks keep long-running work alive across network disconnections for minutes or weeks.
  • MCP apps render interactive iframes before the tool call finishes, enabling rich UIs for dashboards and flame graphs.
  • Sampling lets servers summarize data using the client’s model, saving 30k+ tokens per call.
  • Tool annotations and dynamic tool updates reduce context window overhead and improve safety.

About the Speaker(s)

Harald Kirschner is a Principal Product Manager at Microsoft, building AI coding experiences in VS Code and GitHub Copilot for 40+ million developers. Before Microsoft, he led Firefox DevTools at Mozilla and helped ship Firefox Quantum. His engineering roots (MooTools, early web…

Connor is a principal software engineer working on VS Code since 2019.