Autonomy Is Not Binary
Bits AI investigates thousands of production alerts each week for 4,000 customers. The team chose a supervised autonomy model. The agent autonomously investigates incidents but requires human approval before merging a fix. This avoids making a sleepy engineer lose time. Autonomy sits on a spectrum. High-risk systems like infra or payment always need human approval. Low-risk, high-clarity tasks like tagging tickets can run fully autonomous.
Hypothesis-Driven Reasoning Beats Fixed Workflows
Fixed workflows failed when multi-service failures or missing context occurred. The team switched to adaptive reasoning. The agent generates multiple hypotheses, tests each, and validates dynamically. Engineers can step in mid-investigation to guide the agent. For example, they can say “don’t look into that service, we killed that months ago.” This mirrors how a human engineer thinks during an incident.
Surface Uncertainty, Don’t Hide It
The agent labels investigations as conclusive or inconclusive. Inconclusive means the root cause was not found. Datadog does not charge for inconclusive investigations. This built trust and encouraged usage. The hypothesis tree shows the agent’s reasoning process transparently. When GPT 5.2 caused hallucinations, the team rolled back in minutes. Being ready for model changes is critical.
Feedback Loops That Capture Partial Correctness
Initial feedback options were yes and no. Users rarely clicked either because the answer was partly correct. Datadog changed to no and not quite. “Not quite” became the most clicked option. A comment box collected reasons like “it forgot to check my logs.” Each comment became an eval point. The team used these to add missing checks. Internal engineers tested the product for a full year before GA.
Notable Quotes
“autonomy is not binary” John Su · ▶ Watch (6:33)
“we didn’t hide the uncertainty here but we actually surfaced it to the user.” John Su · ▶ Watch (11:46)
“the not quite answer was actually became the most clicked answer.” John Su · ▶ Watch (16:26)
“it started creating hallucinations everywhere.” John Su · ▶ Watch (13:09)
Key Takeaways
- Autonomy is a spectrum; match it to risk and user expectations.
- Hypothesis-driven reasoning outperforms fixed workflows in dynamic production environments.
- Surface uncertainty transparently to build trust and enable usage-based pricing.