The 90% Unstructured Data Opportunity
Box stores over an exabyte of data for 120,000 companies. 90% of that is unstructured — PDFs, images, medical records, product specs, movie scripts. AI agents need to read and edit this content to automate business workflows. But unstructured data has no relational schema. Historically, only humans could scan it. Box built an MCP server to connect third-party agents (ChatGPT, Copilot, Claude) to that content while preserving enterprise security and governance.
The Context Window Problem
When a user asks an agent to edit a presentation, the MCP server downloads the binary file. The file enters the LLM context window. Binary data is high-entropy, meaningless tokens that degrade attention on valid tokens. The agent unpacks, edits, repackages, and reuploads — each step requires LLM inference. The file passes through context twice. Payload size limits block even moderate files. Users wait. Data corruption is possible because LLM inference is non-deterministic.
Code Mode and Signed URLs
Box uses two patterns to keep file content out of the context window. Code mode lets the LLM write Python that calls MCP tools directly. The output of a download composes with an edit command without inference in between. Signed URLs return a short-lived download link. The agent fetches the file via shell or network request. Both bypass the context. Code mode requires client support. Signed URLs need shell access and careful single-use signature management.
The Lethal Trifecta: Prompt Injection and Exfiltration
Three conditions create risk: access to sensitive data, exposure to untrusted input, and capability to communicate externally. Box’s MCP server meets all three. A demo shows Bob sharing a file with malicious instructions. Alice’s agent reads the file, treats the instructions as a system directive, and exfiltrates data by adding Bob as a collaborator. LLMs cannot separate data from instructions. Any state-changing operation (rename a folder, share a file) becomes an exfiltration vector.
Admin Guardrails and Human-in-the-Loop
Box’s admin console lets IT toggle MCP tools per client — read-only for ChatGPT, read-write for Claude. Policies filter by file labels (confidential) or restrict collaboration to the company domain. Guardrails are binary: the tool is available or not. For granular control, Box plans human-in-the-loop approval via MCP elicitation. A user requesting a delete on a sensitive file would need confirmation before the operation executes.
Q&A
Are you mixing code mode and signed URLs based on client capability? Box builds a generic API; clients choose the method that works with their environment. Code mode enables composability when supported; otherwise signed URLs require network access. ▶ Watch (26:11)
How do you balance admin constraints with user needs for sensitive files? The admin is gatekeeper, but users can make a business case. Human-in-the-loop can approve specific operations instead of a full block. ▶ Watch (29:21)
Notable Quotes
This is a huge abuse of a context usage, and also this is just a bad UX, right? This is super slow. Kailas Krivanka · ▶ Watch (6:58)
LLMs can’t differentiate between the data and instructions. Kailas Krivanka · ▶ Watch (19:36)
any state-changing operation with unstructured input can be an exfiltration vector. Kailas Krivanka · ▶ Watch (19:53)
Key Takeaways
- Binary files in LLM context waste tokens and risk data corruption.
- Code mode and signed URLs bypass the context window for file transfers.
- Prompt injection turns any state-changing tool call into a data leak.
About the Speaker(s)
Fernando Cerenza leads Box’s partner integration ecosystem, where he oversees a vast network of over 1,500 application integrations. He is currently spearheading Box’s AI-focused initiatives, driving development on MCP and A2A to enable advanced agentic AI outcomes and seamless…
Kailas Krivanka is a software engineer with expertise in API design, software architecture, and distributed systems. He has worked at Box for 4 years, building scalable systems and solving complex technical challenges including launching the Box MCP server. With experience across…