MCP Live: Streaming Context To AI Agents - Harshit Kohli, Amazon Web Services

by Harshit Kohli

Day 1 · Astor Ballroom (7th Floor) · MCP Best Practices · 27 min · 4 min read

Applied AI AI Infrastructure

TL;DR

Harshit Kohli demonstrated a streaming MCP architecture that pushes real-time log events from Kafka to AI agents via WebSockets, replacing the standard request-response model. The demo used a 500-event bounded queue for back pressure, a 30% failure threshold for anomaly detection, and Cedar policies for per-request authorization. Agents detected anomalies and answered queries in real time.

The Stale Context Problem

▶ Watch (3:28)

A financial or healthcare application cannot wait 15 minutes for an AI agent to poll logs. Kohli described a scenario where an agent polls at 2:00 a.m. and again at 2:15 a.m. Anomalies that occurred at 2:05 a.m. are only reported at 2:15 a.m. That 10-minute blind spot is the stale context problem. Streaming events directly to agents eliminates the gap. Detection drops from minutes to seconds or milliseconds.

Streaming MCP Architecture

▶ Watch (5:22)

The standard MCP request-response model sends a request and waits for a reply. Kohli’s streaming MCP inverts that flow. A Kafka consumer subscribes to event topics and pushes messages through a WebSocket layer to AI agents in real time. Kafka provides durability, replay, and back-pressure handling. The MCP server decouples the streaming platform from the agent code. Switching from Kafka to Kinesis requires changing only the streaming block, not the agents.

Bounded Queues and Back Pressure

▶ Watch (7:27)

Slow agents break real-time pipelines. Kohli implemented a bounded queue of 500 events per agent. If an agent’s buffer exceeds 500 events, the system drops that agent from the pipeline. The recommended production pattern is to detach the slow agent, fix it, and reconnect it. Kafka retains offsets, so no data is lost. A dead letter queue can capture failed events for replay.

Granular Authorization with Cedar

▶ Watch (9:32)

Kohli used Cedar for per-request authorization at the connection level. Default is deny. Policies define which agents can read messages, get anomalies, or access context. Agent one might be allowed to get anomalies. Agent two might only read messages. Cedar operates at a finer granularity than IAM, which works at the infrastructure layer. Authentication can use mTLS or another mechanism.

Anomaly Detection and Demo

▶ Watch (12:57)

The demo used a sliding window of 100 events with a 30% failure threshold. If 15 of 100 events are failures, the system triggers a spike. Kohli defined six to seven rules for anomaly detection. The agent can also accept interactive queries like “what is the status?” and return event counts and anomaly summaries. The architecture can be extended with an LLM for root cause analysis and auto-remediation.

Q&A

How do you categorize anomalies when events are hard to define? Anomaly rules depend on the use case, such as detecting bot activity on an e-commerce site. ▶ Watch (22:51)

How do you handle LLMs running out of context with large log files? Use enough Kafka partitions for proper streaming and select an LLM model with sufficient token capacity. Start small, test, then scale. ▶ Watch (23:50)

What is the stop condition for an agent? The 500-event buffer is the stop condition. If the buffer fills, the agent is dropped. In production, remove the agent, fix it, and reconnect it. ▶ Watch (25:29)

Notable Quotes

Imagine a situation where you have AI agents which are getting the real-time context of the real-time data and they’re getting the live feed and you’re able to use those AI agents with the real-time data, right? Harshit Kohli · ▶ Watch (0:03)

So, that is actually the stale problem and that is what we are going to handle as part of this use case, right? Harshit Kohli · ▶ Watch (4:41)

The system will drop that agent. And the reason why it will drop it because it’s going to disturb your entire real-time pipeline. Harshit Kohli · ▶ Watch (8:35)

Key Takeaways

Streaming MCP replaces polling with real-time event push from Kafka to agents.
A 500-event bounded queue drops slow agents to protect pipeline performance.
Cedar policies enforce per-request authorization at the connection level.
Anomaly detection uses a 30% failure threshold over a 100-event sliding window.
The MCP server decouples streaming platforms from agent code for easy swaps.

About the Speaker(s)

Harshit Kohli is a Senior Technical Account Manager at Amazon Web Services with 15+ years of experience. He has worked with AWS Data Analytics and GenAI services including Amazon Q, Bedrock, Amazon Managed Kafka, Amazon Data Firehose, and Kinesis Streams. He also has experience with Cloudera Hadoop, Hortonworks Hadoop, and Mapr Hadoop. ```

Filed under

Applied AI AI Infrastructure