Sponsored Session: The Self-Improving MCP Server: Agents in a Live Development Loop - Enrico Toniato, Manufact

by Enrico Toniato

Day 1 · Marquis Ballroom (9th Floor) · Apps and Agents · 18 min · 3 min read

Applied AI AI Infrastructure

TL;DR

Enrico Toniato demonstrated Manufact's self-improving MCP server loop. The system uses agents to test MCP servers across clients and models at three stages: development, CI/CD, and continuous monitoring. The open-source MCPUs framework provides live HMR for protocol changes and MCP apps widgets. Deployment on Manufact Cloud is free for solo developers.

The Pain of Testing MCP Servers

▶ Watch (2:02)

Building an MCP server takes seconds with mature SDKs and frameworks like MCPUs. Testing it across clients and models is the hard part. Clients like Claude release new features daily without testing them. Models from GPT-5.4 to free-tier models handle tool flows differently. The Figma MCP server on OpenAI’s marketplace fails consistently with timeouts. Toniato compared the situation to testing websites on Internet Explorer in 2008.

Why Agents Should Test Agents

▶ Watch (6:33)

The persona of an MCP server is an agent, not a human. Testing needs three levels: protocol correctness (tools return right data), behavioral consistency (tool calls work across models and clients), and visual rendering (MCP apps widgets). Clients change APIs and UIs daily. The MCP server breaks from external changes, not internal bugs. Toniato’s solution lets the development agent automatically connect to the server it builds, call tools, and interact with a sandbox chat to find mistakes before deployment.

Three Stages of Self-Improving Testing

▶ Watch (8:50)

Manufact implements testing at three stages. During development, the agent edits files and tests tools live through the open-source MCPUs inspector. The CI/CD pipeline runs behavioral tests on every GitHub push and suggests fixes for bugs found on one model but not another. Continuous monitoring schedules tests on real clients like Claude. All three stages run autonomously with the human in the loop.

Open Source Tools and Deployment

▶ Watch (13:05)

The npx create-mcpus-app command generates boilerplate for MCP servers and MCP apps widgets. A Vercel-curated skill lets Claude or Cursor build the server without coding. The inspector provides live HMR for protocol changes and MCP apps, a feature the official SDK lacks. Manufact Cloud is free for solo developers with unlimited servers, analytics, and observability.

Q&A

Do you store transcripts of the interaction? The open-source version runs locally with no connection to Manufact systems. ▶ Watch (15:43)

How do you configure the CI/CD test flows? Through the dashboard, you specify the flow to test and mark checks like tool calls or widget returns. ▶ Watch (16:43)

Notable Quotes

the persona of an MCP server is the agent, right? So, it’s actually more natural that this the agent that is going to test the MCP server because it’s going to be the user of that MCP server Enrico Toniato · ▶ Watch (5:58)

some of them are actually the internet explorer of 2026, right? Enrico Toniato · ▶ Watch (5:38)

we are the only framework that is actually doing that, right? like also the official SDK doesn’t have HMR on the protocol level or on the MCP apps Enrico Toniato · ▶ Watch (14:44)

Key Takeaways

MCP server testing fails across clients and models, not from bad code.
Agents are the natural testers for MCP servers.
Manufact’s open-source MCPUs framework provides live HMR for protocol changes.
Three-stage testing covers development, CI/CD, and continuous monitoring.
Manufact Cloud offers free deployment with analytics for solo developers.

About the Speaker(s)

Enrico Toniato is CTO at Manufact (formerly mcp-use). He was previously AI tech lead at IBM Research, achieved SoTA in Text2SQL, presented at NeurIPS, and worked on robotics at ETH Zurich.

Filed under

Applied AI AI Infrastructure