MCP Servers in the Wild: Managing Tool Complexity at Scale - Arnav Balyan, Concierge AI

by Arnav Balyan

Day 1 · Broadway Ballroom South (6th Floor) · MCP Best Practices · 18 min · 3 min read

Applied AI

TL;DR

MCP downloads hit 130 million in February 2026, a 250x increase since November 2024. Beyond 20 tools, tool descriptions saturate context windows (550 tools fill 200k tokens) and state-changing use cases demand server-guided tool selection. Arnav Balyan demonstrated four patterns: progressive disclosure, code execution, plan-based execution, and tool search. A study of 50+ MCP servers found plain mode fastest under 22 tools, search best for stateless over 50, and planner/code for dependencies. Concierge AI's open-source SDK converts servers into these patterns with one line of code.

Two Problems Emerge Beyond 20 Tools

▶ Watch (0:02)

MCP downloads reached 130 million in February 2026, a 250x increase since November 2024. As tool counts grow beyond 20, two problems appear. First, tool descriptions and schemas become large. At 550 tools, they saturate a 200k token context window. Second, use cases shift from analytical to transactional. Users now ask agents to book hotels or make payments. Failure is hard to reverse. Servers must guide tool selection instead of leaving agents to guess.

Progressive Disclosure Turns Tool Lists into Trees

▶ Watch (3:02)

Instead of exposing all tools as a flat list, servers group them into tool groups. The model sees five groups instead of 100 tools. Invoking a tool group triggers a tool update notification. The client re-lists tools and receives the actual tools inside that group. This converts a list into a tree. Servers can enforce dependencies: an e-commerce workflow requires login before checkout. The server only reveals the next group after the previous one is invoked. The model never sees the full graph, only one node at a time.

Code Execution and Plan-Based Patterns Reduce Token Waste

▶ Watch (8:33)

Code execution reduces hundreds of tools to a single sandbox tool. The model writes Python code that imports tool stubs and processes outputs. A 200-page Notion doc stays in a variable instead of the context window. The model can sort, search, and send refined data to the next tool reliably. Drawbacks include sandbox cost and security. Plan-based execution offers an alternative: the model writes a JSON execution plan that re-references previous tool outputs. Both patterns convert hundreds of tools into two tools: a search tool and an execute tool.

Study of 50+ Servers Reveals When Each Pattern Wins

▶ Watch (13:22)

Balyan’s team tested 50+ MCP servers from GitHub, Slack, Copilot, and others. Tasks included finding a bug in Slack, checking Notion, and raising a GitHub PR. Under 22 tools, plain mode is fastest and simplest. Over 50 stateless tools, search mode works best. For tools with dependencies, planner or code mode is right. Plain mode is fastest in completion time but uses the most tokens. Code mode is the most token-efficient. Token savings translate to cost savings at scale. The team will publish full numbers after the conference.

Notable Quotes

beyond 550 tools, you hit the 200k context window Arnav Balyan · ▶ Watch (1:24)

the server can give its own opinions Arnav Balyan · ▶ Watch (2:22)

you convert this listbased structure into a tree Arnav Balyan · ▶ Watch (4:08)

with code, it’s always guaranteed Arnav Balyan · ▶ Watch (10:19)

under 22 tools, plain mode is the best Arnav Balyan · ▶ Watch (14:12)

Key Takeaways

Progressive disclosure reduces context saturation by converting tool lists into navigable trees.
Code execution and plan-based patterns cut token usage by keeping large outputs in variables.
A study of 50+ MCP servers shows plain mode fastest under 22 tools; search or planner for larger sets.

About the Speaker

Arnav Balyan is the founder of Concierge AI. He previously worked at Uber building MCP systems at scale. Concierge AI manages over 400 public MCP deployments. His research focuses on MCP tool complexity and token overhead reduction.

Filed under

Applied AI