Enabling Agentic Cloud Workflows - Santhosh Misro & Mayur Deshpande, Google

by Mayur Deshpande, Santhosh Misro

Day 1 · Juilliard Complex (5th Floor) · Apps and Agents · 22 min · 3 min read

Applied AI

TL;DR

Google Cloud Storage built a storage intelligence MCP server that lets AI agents query billions of objects via natural language. The server indexes bucket metadata into BigQuery and provides analysis and operations tools. Production tuning reduced token usage 35% and median cost per session to 9 cents. Caching 11,000 tokens per session cut latency.

Storage management is manual and script-heavy

▶ Watch (0:38)

Storage admins handle tasks one at a time and maintain complex scripts. Google Cloud Storage customers have billions of objects scattered globally. Santhosh Misro described the goal: move from manual to autonomous storage management. The team built foundational capabilities that reached GA last year. Customers can index all data across their entire organization, create a queryable index, then use AI agents to analyze, compute, and act.

MCP server architecture: insights and operations

▶ Watch (2:58)

The MCP server exposes two tool sets. The insights toolkit uses BigQuery linked datasets, letting agents run SQL via natural language. The operations toolkit handles execution: creating buckets, moving objects. Snapshot consistency is key — agents always act on 100% complete data, not eventual consistency. The schema metadata tool reduces hallucinations by telling the LLM exactly which tables to query.

Demo: find and archive old JPEGs across buckets

▶ Watch (6:31)

Using Gemini CLI, a storage admin asked: find all JPEG files older than 30 days and smaller than 2 MB across all buckets. The agent first fetched the BigQuery schema, then translated the request to SQL, queried the metadata snapshot, and identified the objects. Then it switched to the operations toolkit, created an archive bucket, copied the files, and set them to archive storage. The demo ran against billions of objects in seconds.

Production tuning: instrumentation, caching, cost

▶ Watch (10:39)

The team tracked over 20 data points per Gemini CLI session. They merged tools to reduce tool chaining and made discovery optional — if the agent already knew insights was enabled, it skipped rechecking. These changes cut token usage by 35%. Static metadata such as table schemas was cached between follow-up queries, saving about 11,000 tokens per session. Median cost per session landed at 9 cents. Reasoning depth and wait time were tuned so the agent spent about 65 seconds to find a needle in a haystack.

Q&A

How did you define shortcuts for the agent? They collapsed tools and made the agent skip redundant discovery calls by checking cache first. ▶ Watch (17:07)

What about stale cache? Only static metadata is cached; TTLs ensure refresh for feature-enabled flags. ▶ Watch (18:26)

Are you letting the agent compose its own tools from granular ones? Not yet, but they are exploring skills — workflows encoded in markdown files that the agent can pick. ▶ Watch (20:11)

Notable Quotes

That was the easy part creating the prototype the demo but now you know as all of you know trying to productionize these agents is the hard part. Santhosh Misro · ▶ Watch (10:39)

we ended up caching about 11,000 odd tokens and we kept track of those and uh this helped overall reduce the latency. Santhosh Misro · ▶ Watch (13:52)

overall median cost per session ended up being around 9 cents or so. Santhosh Misro · ▶ Watch (14:18)

Key Takeaways

Google’s storage MCP server lets agents query billions of objects via natural language.
Production tuning cut token usage 35% and median cost to 9 cents per session.
Caching static metadata saved 11,000 tokens per session and reduced latency.
The server splits analysis (insights) and action (operations) into separate tool sets.
Snapshot consistency ensures agents act on complete data, not eventual consistency.

About the Speaker(s)

No speaker bios were provided.

Filed under

Applied AI