Reinforcement Learning Changed What Models Can Do

▶ Watch (2:08)

Post-training models with reinforcement learning lets them drive toward bigger goals. Smith described a reward function that scores a model’s trajectory and reinforces good behavior. The result: models conduct very long token sequences. On SWE-bench, a minimal harness with 100 lines of code and a single free-form non-JSON shell token scores 76%. State-of-the-art models in harness hit about 80%. That single shell token changed the dynamics for standard IO services.

Dynamic Tool Calling at Hugging Face

▶ Watch (5:11)

Hugging Face gives users access to thousands of machine learning models across audio, video, text, and image generation. Smith showed how models dynamically look at the environment and, in a 45-token tool situation, call any of those models. The model writes its own tool descriptions and brings them into context. MCP provides authentication for correct quality of service and multi-model support. The image generation demo used an open-weight model with a base language model conducting the search.

Code Execution and the Generation-Execution Split

▶ Watch (8:28)

Models now write and execute code natively. Smith demonstrated a query tool where the model wrote Python code in response to a prompt, executed it, and returned specific results. His laptop wasn’t powerful enough for a local model, but the setup works with both remote and local models. A small customized model of about 1,000 tokens handled the job. MCP gives the option to deploy to remote execution environments and make speed-time trade-offs while keeping a known API surface.

Short-Circuiting Expensive Post-Processing

▶ Watch (11:53)

The standard MCP pattern sends tool call results into the context window, then the model re-processes that data with expensive output tokens. Smith showed a shortcut: if the data’s destination is the user, skip the post-processing. He referenced Prefect, a library where the model generates user interface components to display data directly instead of generating JSON. This enables generative user experiences built on a small amount of data.

MCP’s Product-Market Fit and What Came After Launch

▶ Watch (19:45)

Smith addressed the “did MCP die?” debate. MCP hit product-market fit through experimentation. Features like auth and remote transport weren’t present at launch. For consumer and enterprise cases, ease of use and regulatory support make MCP a perfect solution. Thousands of clients connect with a single URL for authenticated access. The resource-based URI scheme extension point paid off, letting apps SDK roll out quickly on a solid base of existing integrations.

Notable Quotes

MCP has kind of hit an almost perfect product market fit and it’s done that through experimentation. Shaun Smith · ▶ Watch (19:55)

if you’re at the point where you can one shot service the important point isn’t the distribution of code it’s actually the distribution of ideas Shaun Smith · ▶ Watch (22:54)

MCP’s matured. It’s become commodity infrastructure. That’s a very very good thing. Shaun Smith · ▶ Watch (23:17)

Key Takeaways

  • Reinforcement learning post-training lets models drive toward bigger goals with longer token sequences.
  • Hugging Face uses 45-token dynamic tool calls to access thousands of ML models via MCP.
  • MCP’s missing launch features like auth were filled by the community, proving its design flexibility.

About the Speaker(s)

Shaun Smith leads Open Source MCP at Hugging Face, and is an MCP Steering Committee member serving as a Community Moderator and within the Transports Working Group.