The Human Onboarding Problem Mirrors Agent Context Bloat

▶ Watch (3:14)

New employees face a pile of tools and no tribal knowledge. Agents are worse: they have no memory of past sessions. Every MCP server added to the system prompt expands the context window. Guerrero showed that system prompts can consume up to 70% of the context window. The result is longer latency, higher token cost, and degraded reasoning. Adding three MCP servers (Chuck Norris jokes, GitHub, Jira) raised token consumption from 26 to 3,000 tokens for a trivial request.

MCP Gateways Filter Tools by Identity

▶ Watch (12:10)

Guerrero demonstrated an MCP gateway placed between agents and servers. Without authentication, all tools (including destructive ones like delete branch) were visible. After enabling JWT-based authentication, a read-only user (Alice) saw only safe tools. Her agent consumed half the tokens of an unauthenticated agent. A super-admin (Bob) saw all tools again. The gateway enforced role-based tool visibility without modifying client or server code.

Registries Enable Just-in-Time Tool Discovery

▶ Watch (16:20)

An MCP registry provides service discovery and endpoint resolution. Developers can browse servers, copy URLs, and configure agents manually. For autonomous agents, the registry itself exposes an MCP server. The agent queries it at runtime to find which servers are available and relevant. Guerrero showed a Volcano-based agent that first calls the registry, then injects only the server needed for the current task. This avoids bloating the prompt with tools that might never be used.

Smarter Context, Not Bigger Prompts

▶ Watch (23:06)

The solution is not hacking prompts to fit more tools. Guerrero advocates for a “context mesh”: an abstraction layer that semantically routes tool requests. The combination of a registry (dynamic discovery) and a gateway (authorization filtering) reduces tool count per request. Tokens are spent only on tools the agent actually needs. Guerrero said this pattern has been successful at the enterprise level.

Notable Quotes

“We can have up to 70% of the context window just for the system prompt” Hugo Guerrero · ▶ Watch (3:22)

“The token consumption went from 26 tokens to 100 tokens. Well, that’s not that much, right? So, it could be something because, you know, the tell me a Chuck Norris joke is a very simple tool. The MCP server should just, you know, be um a few uh tokens length. But when I add simple tools, you know, every day um programmers and developers um process and I suddenly need to say, okay, I need GitHub, I need Jira, and I then rerun the same prompt because we are adding those tools just in case that the prompt will require access to them. We are bloating the context window to 3,000 tokens just by adding the tools just in case.” Hugo Guerrero · ▶ Watch (6:10)

“I didn’t have to change at all the code on the client or the MCP server.” Hugo Guerrero · ▶ Watch (16:00)

“It’s really not bigger prompts or trying to hack the prompt to be able to make them uh work around it. It’s basically make it a smarter context being able to decrease the tool count, inject them in time.” Hugo Guerrero · ▶ Watch (23:13)

Key Takeaways

  • Context bloat increases latency, token cost, and risk of hallucination.
  • An MCP gateway enforces tool visibility per identity without code changes.
  • An MCP registry lets agents discover and inject only needed tools at runtime.

About the Speaker(s)

Hugo Guerrero is a tech leader, speaker, and architect obsessed with AI, APIs, and the systems that connect them. From scaling developer ecosystems to mastering event-driven architecture, he focuses on making agentic connectivity a practical reality for modern enterprises. ```