One Spec, Ten SDKs, Zero Excuses: Conformance Testing MCP - Paul Carleton, Anthropic

by Paul Carleton

Day 1 · Broadway Ballroom North (6th Floor) · Protocol in Depth · 23 min · 4 min read

Applied AI

TL;DR

Paul Carleton demonstrated the MCP conformance test suite, which automates verification of 567 must/should statements across 10 official SDKs. The suite runs scenarios against client and server implementations, checking up to 15 details per scenario. Four SDKs (TypeScript, Python, C#, Go) achieved tier-one conformance with 100% pass rate. The suite uncovered security bugs like missing issuer validation and DNS rebinding attacks.

The 567 Opportunities for Divergence

▶ Watch (0:03)

The MCP specification contains 567 must and should statements. Paul Carleton opened with a pagination example: two client implementations, both defensible, diverge on how to treat an empty next_cursor. Server behavior is ambiguous, causing one client to loop forever. The result is incompatibilities and bug reports across the ecosystem. With 10 official SDKs and custom implementations, each statement is a chance for divergence.

Building an SDK-Agnostic Conformance Suite

▶ Watch (4:27)

Carleton designed the suite to be SDK-agnostic and minimal dependencies. It uses two harness modes: server-under-test and client-under-test. For a server, the suite spins up a client that runs a specific set of behavior probes and compares responses. For a client, it spins up multiple example servers and runs the client against them. Each test produces a check with ID, status, spec reference, and details. The suite does not require SDK developers to implement both sides of a scenario.

Demo: Conformance Testing in Action

▶ Watch (9:03)

Carleton ran the TypeScript SDK’s everything client against an auth scenario using npx model-context-protocol-conformance. The suite launched a server, posted to the MCP endpoint, received a 401, and walked through OAuth metadata discovery and authorization. It performed 15 checks, including a valid bearer token verification. The test passed. Verbose mode displayed each check with spec references and timestamps.

Tiering Results and Security Wins

▶ Watch (15:10)

SEP 1730 introduced tiering for SDKs. Tier one requires 100% conformance on the latest test batch, two-day issue triage, and 7-day P0 for security bugs. Four SDKs achieved tier one: TypeScript, Python, C#, Go. Java and Rust reached tier two. Conformance testing also uncovered security bugs. The Go SDK correctly rejected an invalid issuer, revealing that other SDKs lacked issuer validation. DNS rebinding attacks on localhost servers were also caught and fixed across all SDKs.

The Road Ahead: Full Coverage and Mandatory Scenarios

▶ Watch (18:41)

The current suite covers about 100 of the 567 must/should statements. Carleton identified 266 more that are testable and 200 that are not testable. He proposed SEP 2484, which would require every new SEP to include a conformance scenario or explain why it cannot be tested. The June spec release will be the first to apply this rule. Carleton invited feedback on the proposal.

Q&A

Is there an authorization server test suite? Carleton said they recently added one that probes the authorization server directly, but it’s still in progress. ▶ Watch (21:32)

How well does AI work for generating conformance scenarios? Carleton said it is good for code genning the tests, but scenarios need more scrutiny because mistakes create costs for everyone. ▶ Watch (22:04)

Notable Quotes

567 must and should statements in the uh 2025 1125 specification. Paul Carleton · ▶ Watch (3:14)

We wanted to be minimal dependencies. Paul Carleton · ▶ Watch (7:08)

The goal is you implement an everything server, you implement an everything client. Paul Carleton · ▶ Watch (11:43)

We have four tier one SDKs. So, TypeScript, Python, C#, and Go are all tier one. Paul Carleton · ▶ Watch (16:57)

If you mess up a scenario, that just creates a cost for everyone if you get it merged Paul Carleton · ▶ Watch (22:31)

Key Takeaways

The MCP spec has 567 must/should statements; conformance testing automates consistency checks.
The test suite is SDK-agnostic and uses two harness modes for client and server testing.
Four SDKs achieved tier-one conformance, catching security bugs like missing issuer validation.

About the Speaker(s)

Paul Carleton is a Core Maintainer of the Model Context Protocol and Auth Nerd at Anthropic, where he leads auth implementations across Anthropic’s clients and the TypeScript and Python SDKs. He drives MCP conformance testing efforts to ensure consistent behavior across the ecosystem.

Filed under

Applied AI