The Trust Gap Just Got Real

In the May 12 to May 25 window, three things happened in the same two weeks. FDA issued its first Warning Letter citing "excessive reliance on artificial intelligence" in drug manufacturing.

Two weeks ago, AI in biotech was a story about regulators building infrastructure. Today, it is a story about regulators using it.

In the May 12 to May 25 window, three things happened in the same two weeks. FDA issued its first Warning Letter citing "excessive reliance on artificial intelligence" in drug manufacturing. Bristol Myers Squibb committed to deploying Claude Enterprise across 30,000+ employees with explicit application to batch-release decisions, CAPA documentation, and clinical study report generation. AstraZeneca licensed Owkin's agentic AI platform to build bespoke biopharma agents touching competitive intelligence and IND-adjacent decisions.

These are not adjacent stories. They are the same story. AI is moving deeper into the regulatory-evidence layer of pharma. The trust gap, the distance between what an AI system actually did and what a regulator can verify it did, just stopped being theoretical.

FDA enforcement caught up.

The Warning Letter is the dominant signal of the cycle. An unnamed drug manufacturer was cited for "excessive reliance on artificial intelligence (AI) to create drug specifications, procedures, and production records, without adequate quality control oversight." This is the first FDA enforcement action that explicitly identifies AI overreliance as a cGMP violation.

The language matters. "Excessive reliance" is not "use of AI is prohibited." It is a precise diagnosis. The firm used AI without enough human review, without enough documented oversight, without enough evidence that the AI outputs were checked before they became regulated artifacts. This is the gap that has been widening for two years as vendors marketed "AI for batch records" and "AI for deviation investigation" without standardized evidence frameworks underneath.

For every sponsor running AI in GMP workflows, this is the moment to ask whether the system can produce a defensible chain of evidence for every regulated artifact it touches. Not a model card. Not a vendor SOC 2. An auditable record of what the model saw, what it concluded, and who reviewed the output before it entered the batch record.

BMS just made the gap larger.

Five days after the enforcement signal entered industry coverage, Bristol Myers Squibb announced its strategic agreement with Anthropic. Claude Enterprise will be deployed across the company's global operations.

The use cases are explicit. Generate a clinical study report from underlying trial data. Trace the root cause of a manufacturing deviation in real time. Data-driven batch-release decisions.

Greg Meyers, BMS Chief Digital and Technology Officer: "Most enterprise AI stops at the chatbot. The real prize is the untapped value still trapped behind decades of data silos."

He is right about the prize. The question every sponsor needs to answer is whether the AI outputs that touch regulated artifacts can be reconstructed, traced, and inspected three years later when an FDA investigator asks. "Full enterprise governance and audit controls in place," the language used in the BMS announcement, is a marketing claim. The standards that make it inspectable are still being written.

AstraZeneca went agentic.

On May 13, AstraZeneca signed a three-year licensing deal with Owkin for its "AI Scientist" platform. Owkin will build bespoke biopharma agents, including a competitive-intelligence agent that scans clinical-trial activity, recruitment trends, and patent filings, drawing on multimodal data from more than 800 hospitals.

Owkin CEO Thomas Clozel: "the future of the pharmaceutical industry is agentic."

The deal matters for a reason that is easy to miss. Agents make decisions. When a competitive-intelligence agent concludes that a development program should be deprioritized, a chain of inferences sits behind that recommendation. Most of that chain is currently invisible. When a regulator or a board member asks why the decision was made, "the AI suggested it" will not be a defensible answer.

This is the proof-of-insight problem. The output is easy. The traceable reasoning path behind the output is the hard part. It is also the only part regulators will care about.

The leadership vacuum complicates the picture.

Marty Makary resigned as FDA Commissioner on May 12. Kyle Diamantas is acting. FDA now has three top positions filled by acting heads.

Makary was the public face of FDA's AI-modernization push: the Chief AI Officer role (Jeremy Walsh), the Real-Time Clinical Trial pilot, the January 14, 2026 joint FDA–EMA Good AI Practice principles. Diamantas is described as a "stabilizing caretaker" with no public AI agenda.

Two things are true at once. The structural reforms (RTCT, CNPV, the Walsh AI office) are administration-backed and likely to survive. And the pace of new AI guidance will probably slow for three to six months. Sponsors who were waiting for FDA to tell them exactly what good AI looks like in regulated workflows just lost their fastest path to clarity.

That puts more weight, not less, on internal evidence infrastructure.

Underneath the deals, the science is humbling itself.

Two preprint signals deserve attention from anyone building or buying biological foundation models.

The April 17 bioRxiv benchmark tested scGPT, scGPT_CP, scFoundation, scMulan, and CellFM against scVI and Harmony on more than 1.5 million cells. Top overall performer: scVI, a 2018-era variational autoencoder. The best foundation-model variant was fine-tuned scGPT_CP, still narrower in scope than the field's marketing has implied.

A separate May 13 paper tested AlphaFold-family generative samplers (AF-cluster, AlphaFlow, ESMFlow, ConfRover, BioEmu) against AI-accelerated molecular dynamics on T4 Lysozyme. The generative models collapsed to a single "open" state. MD recovered all four physical states matched to smFRET data.

The story these papers tell is consistent. "More parameters, more biology" is being empirically pressure-tested and failing. For build-versus-license decisions, domain-specific architectures and task-level benchmarks are beating scale. Anyone who is told an scFM will improve target identification or compound triage should ask for the benchmark that beats scVI on their actual task.

The deal layer is consolidating.

Anthropic acquired Coefficient Bio for a reported ~$400M in window. Pre-commercial biotech, fewer than ten employees. First time a horizontal AI lab has acquired a biology-native startup at scale.

The pattern is clearer every cycle. AI infrastructure deals are outpacing AI molecule deals. Lilly–NVIDIA ($1B over five years), Noetik–GSK foundation-model licensing, Lilly–Insilico ($2.75B), now BMS–Anthropic and AstraZeneca–Owkin. The bet has shifted from "this AI company will produce a drug" to "this AI infrastructure will produce many drug programs over a decade."

The investable asset is the substrate, not the molecule.

The throughline.

Speed without traceability is a liability in regulated industries. This cycle showed exactly how the bill comes due.

FDA enforced on it. A top-5 pharma committed publicly to using AI in batch release without saying how the evidence chain will work. Another top-5 committed to agents making decisions adjacent to regulatory filings. And the academic literature reminded everyone that the AI itself is not as good as the marketing.

The companies that will win the next 18 months are not the ones with the most agents or the largest foundation model. They are the ones whose AI decisions, every one of them, can be reconstructed when an inspector or a board member asks.

That is what evidence infrastructure is for.

Prior-cycle updates. RTCT RFI closes May 29; no new sponsor data on TRAVERSE or STREAM-SCLC. Recursion, Anagram, Artera, PathAI: no movement. Isomorphic Labs reportedly raising $2B+ from Thrive and Alphabet; no new IND or trial start. The scFM benchmark wave continues with no flagship release strictly within the window.