AI Agents in Production: What Breaks, and the Catch-Rate Log From Making One AI Check Another

The single most useful thing I do with AI in production is make a second model check the first one's work.

I keep a running log of it. Across 59 second-opinion reviews, where one AI model reviews another's code before it ships, 52 are settled and a different model caught a real bug in 46 of them. That is an 88 percent hit rate on work the first model already thought was finished. A model is worst at seeing its own mistakes. A different model, with different blind spots, is the cheapest fix I have found.

I run about two dozen agents across two AI-powered businesses. I wrote up the framework for keeping them honest as a free course, Run AI agents that don't lie, leak, or overspend. This is the field data underneath it. Where they actually broke, and what the numbers were.

The failure ledger

Four ways agents fail in production, and the fix that actually holds for each. None of the fixes is a smarter prompt.

Failure	What it looked like	What it cost	The fix that holds
Hallucinate	An agent invented a whole grid of data, names and numbers, internally consistent.	It passed a deterministic check that only read the agent's own output.	Verify against an independent source, never the artifact the model produced.
Leak	An agent with web access and real credentials is one bad instruction from sending data out.	A single leaked secret once forced a full credential rotation.	Least privilege, secrets by reference, and treat the web as data, never instructions.
Overspend	One high-volume job quietly ran on a frontier cloud model.	About 50 dollars a day, silently.	Move it local with an empty fallback that stalls instead of paying. Now near zero.
Break	Agents flap between local and cloud models, gateways die after a restart, a rotated key fails every call.	Hours lost. Once a safety check was silently offline for a stretch.	A zero-cost watchdog that reads the running system, and change one thing at a time.

88%

of settled second-opinion reviews caught a real bug

46 / 52

reviews where a second model found something real

agents running in production

$50 → ~$0

per day after moving one job to a local model

Do AI agents hallucinate in production? Yes, and the scary part is when it passes your check

One of my agents reads source documents and extracts structured claims. One day it invented an entire grid of figures. Tier names, dollar amounts, the bands they applied to, all of it fabricated, and all of it internally consistent. The part that should worry anyone building this: it passed the deterministic check I had built to catch exactly this.

The check was looking in the wrong place. It only confirmed that each number the agent asserted also appeared in the same document the agent had produced. So when the agent fabricated the figure and the supporting text together, the fake number sat inside the fake excerpt, and it grounded. Zero flags.

A check that reads the same text the model produced cannot catch a self-consistent lie.

What caught it was a second step that ignored the agent's output entirely and went back to the live source to re-derive the claims from scratch. That independent re-fetch surfaced at least five fabrications the first gate had cleared. The lesson is not that the model is bad. The lesson is that verification has to come from a source the model did not get to write.

Why a second model catches what the first cannot

This is the same principle as the catch-rate log up top, in a different place. When I have a different model review my code before it ships, it catches a real bug roughly 88 percent of the time. When a second step re-checks an agent's claims against the live source, it catches fabrications the agent's own pipeline cleared. Two different jobs, one idea: the check has to be independent of the thing being checked.

It works because a model tends to be most confident exactly where it is consistently wrong. Ask it to review its own work and it will defend the same mistake it just made, with the same reasoning. A different model has different failure modes, so it sees the gap the first one is blind to. That is the entire mechanism. It is not sophisticated. It is just two sets of eyes that do not share a blind spot.

I want to be honest about what that number is and is not. It is my own dogfooding log, not a controlled study. A review counts as a catch when the second model surfaced at least one real bug that I then fixed, and seven reviews are still open. It is a habit with a track record, not a benchmark.

How agents overspend, and the control that fails closed

The most expensive mistake I made was quiet. One high-volume job ran constantly on a frontier cloud model, and it was costing around 50 dollars a day before I noticed. Nothing broke. The work got done. The bill just kept running.

The fix was to move that job onto a local model and give the agent an empty fallback list. If the local model is down, the agent stalls and waits. It does not silently fail over to an expensive cloud model to finish the job. That one change took the daily cost on that work to near zero. A loud stall is cheap. A silent spend is not.

The same empty-fallback trick is also a privacy control. An agent that cannot fall back to the cloud also cannot ship private data there. The cost knob and the privacy knob turn out to be the same knob.

One honesty note, because it matters for trusting the number: I do not keep a per-agent dollar ledger. These are documented estimates from my own records, not a billed measurement. The figure that matters is the direction, and the direction was a cliff.

The rule under all of it: the agent proposes, code disposes

The strongest guardrail I have is not a better prompt. It is a hard limit on what the agent is allowed to do. The agent that keeps one of my sites current cannot touch the repository or run git. It writes a proposal. A separate, deterministic script does the commit and the push, on a branch, and only after I approve. The thinking is done by the model. The action is done by code I trust.

That rule exists because of scar tissue. A background process once committed and pushed an entire working tree with one-letter messages, and a hard reset wiped out uncommitted work. After that, no agent of mine gets to act directly on anything that matters. It can only propose.

The deepest version of this is structural. In my setup the real wall is a database grant: the credential the authoring agent holds physically cannot record an approval. Not because a smarter model is watching, and not because a scanner is reading every message, but because the permission simply does not exist for that account. A prompt is a request. A grant is a law. The wall that protects you is the one the model cannot talk its way past.

Questions I get about running agents

Do AI agents hallucinate in production?

Yes. The dangerous case is a self-consistent fabrication that passes a check reading only the model's own output. In my logs, an independent re-fetch of the live source caught at least five fabrications that a deterministic same-document grounding check had cleared with zero flags. The fix is to verify against an independent source, never the artifact the model produced.

Does a second AI model actually catch bugs the first one misses?

In my own running log, yes. Across 52 settled second-opinion reviews, where a different model reviews the first one's code before it ships, the second model caught a real bug in 46 of them. That is an 88 percent hit rate. A model tends to be most confident exactly where it is consistently wrong, so a different model with different blind spots is the cheapest check I have found.

What do AI agents fail at most in production?

In my experience, three things: hallucinating with confidence, overspending silently, and acting outside the permission you assumed they had. Reliability is a constant fourth: gateways go down after a restart, a rotated key starts returning errors, and an agent quietly flaps between a local model and a cloud one.

How do you stop an AI agent from leaking data or overspending?

Least privilege and controls that fail closed. Scope every credential to the minimum, treat the web as data and never as instructions, and give cost-sensitive or privacy-sensitive agents an empty fallback list so they stall instead of silently failing over to an expensive or external cloud model. A loud stall is cheap. A silent spend or a silent leak is not.

None of this needs a smarter model.

It needs a second set of eyes that doesn't share the first one's blind spots.

AI agents in production: what actually breaks

The failure ledger

Do AI agents hallucinate in production? Yes, and the scary part is when it passes your check

Why a second model catches what the first cannot

How agents overspend, and the control that fails closed

The rule under all of it: the agent proposes, code disposes

Questions I get about running agents

Paul Takisaki

Related Field Notes

Run AI agents that don't lie, leak, or overspend

Start with friction, not the tool.