Blog

What Does It Mean to Review an AI-Generated Document?

Your quality team signed an AI-generated deviation summary this week. Their signatures are compliant timestamps. When an FDA investigator asks them to walk through what the AI considered before reaching that conclusion, they will not be able to. The audit trail will show a review. The inspector will find a signature. Those are not the same thing — and FDA has now established exactly how it will tell them apart.

Leucine Research | Apr 29, 2026 | 9 min read

Consider this as a thought experiment before anything else. A QA director at a well-run US manufacturing facility reads FDA’s first AI warning letter in April 2026. She notices the cited company had no quality unit — a categorical failure that bears no resemblance to her operation. Her facility has seventeen credentialed QA professionals. Her team reviews AI-generated deviation summaries, approves AI-assisted CAPA drafts, and signs batch record analyses every week. She files the warning letter under regulatory intelligence and returns to the forty-three AI-assisted deviation summaries pending QU review in her queue.

Now imagine that fourteen months later, during a routine inspection, an FDA investigator asks her to walk through the basis for a root cause determination from a deviation investigation closed the previous year. The AI generated the root cause summary. She reviewed it. She approved it with her electronic signature under 21 CFR Part 11. She can read the conclusion now. She approved that conclusion at the time. What she cannot do — what the system she uses has never made possible — is explain what signals the AI weighted in arriving at that conclusion, what historical cases it compared against, what alternatives it considered and dismissed, and why the root cause it selected was preferred over the others it evaluated.

The investigator does not ask whether the signature is valid. It is valid. The investigator asks one more question: can she walk him through the basis for the procedure — not the conclusion, the reasoning behind it. The audit trail shows a compliant signature and a timely review. It does not show a compliant review.

Sit with that distinction for a moment. It is not the scenario the industry has been preparing for since the AI warning letter was published. The scenario the industry has been preparing for involves a company with no quality unit, whose use of AI was unreviewed and ungoverned. That scenario is real, and it produced the warning letter. The question worth asking — genuinely, not rhetorically — is: what scenario produces the next one?

What exactly are you attesting to when you sign a document whose reasoning you cannot see? That question has always existed in pharmaceutical quality management. AI makes it harder to look away from.

What Has the Standard Always Required?

Consider how 21 CFR 211.22(c) applies to AI outputs — and whether the standard has changed, or only the context it governs

One way to think about FDA’s first AI warning letter is through the regulatory text it cites: 21 CFR 211.22(c), which requires the quality control unit to review and approve written procedures. That sentence has been in federal regulations since 1978. It has governed SOP approvals, master batch record reviews, validation protocol sign-offs, and deviation investigations for nearly five decades. It has never been a requirement for a quality unit to be physically present. It has always — at least in its plain reading — been a requirement for the quality unit to conduct a genuine review: to take substantive accountability for the accuracy and regulatory adequacy of what it approves.

Here is the interesting tension that AI introduces: not to the standard, but to the ease with which the appearance of compliance can be produced without the substance. Consider the mechanics. An AI system generates a deviation root cause summary. A credentialed QA professional reviews it, confirms it reads correctly, and signs it within the required timeframe. The audit trail is complete. The signature is valid. The timeline is within SOP. And yet — the review that 211.22(c) may have always required might not have occurred. Not because the reviewer is unqualified. Because the architecture may not have given them access to the information they would need to conduct it. Whether that constitutes the same gap the warning letter described is a genuinely open question. Worth thinking through before assuming the answer is obvious in either direction.

There is an accumulation dynamic worth considering here. When a company operates with a missing SOP or an overdue validation study, the gap is visible — it shows in the document control system, in the audit schedule, in the training matrix. The gap that AI-assisted review without reasoning access might create is invisible from the outside. Every document has a signature. Every review has a timestamp. Every workflow closed on schedule. The gap, if it exists, is not between what is documented and what happened. It is between what the audit trail asserts happened and what an inspector’s questions might reveal.

Imagine fourteen months of AI-assisted deviation summaries, CAPA approvals, and batch record reviews — each signed by credentialed QA professionals who read the outputs but could not access the reasoning. Each one a compliant-looking record. Each one a potential gap between the review the audit trail asserts and the review an inspector might probe for. The audit trail would not be fraudulent. It might be incomplete in a way that does not become apparent until examination. Whether FDA would characterise that incompleteness the same way it characterised the first warning letter is the question the industry has not yet had to answer — but may be beginning to.

Four Ways the Industry Might Have Misread the Warning Letter

The outlier framing is defensible on the surface — but is it asking the harder question the warning letter poses?

The warning letter produced a rapid and mostly unified response across the pharmaceutical industry: scrutiny, caution, and a categorical distinction between “companies like that” and “companies like us.” The company FDA cited had no quality unit. The industry has quality units. The warning letter, therefore, was about someone else. That reading is not irrational. But it may be incomplete in the way that matters for the next inspection cycle.

Four specific framings shaped the industry’s response. Each one is worth examining — not to dismiss it, but to ask whether it fully settles the question or only appears to.

Consider the Outlier Framing Carefully

The company FDA cited had no quality unit. That is an obvious failure — categorical, extreme, and easy to dismiss as irrelevant to a manufacturer with a fully staffed QA department. But here is a question worth working through: has 211.22(c) ever been a requirement for a quality unit to exist, or has it always been a requirement for the quality unit to conduct a genuine review? If the standard is about the substance of the review rather than the presence of the reviewer, then consider what it would mean for a company whose quality unit exists, is credentialed, and signs every document on time — but cannot evaluate the AI outputs it approves because the system never exposes the reasoning. Whether that is the same gap with a different presentation, or a categorically different situation, is worth thinking about rather than assuming.

What Does a Professional-Looking Output Actually Tell You?

There is an interesting tension in how AI outputs present themselves. A deviation root cause summary generated by a language model is, in many respects, indistinguishable on the surface from one written by a senior QA professional — structured, hedged, evidenced in its language. What is different, and worth thinking carefully about, is what that professional appearance does and does not tell you about the reasoning behind it. When a colleague writes a root cause summary, you can ask them to walk you through it. The question worth sitting with is whether the architecture of AI-assisted review preserves that same possibility — and what it means for the review if it does not. Output quality and reasoning quality may not be the same thing. Whether they need to be is worth examining rather than assuming away.

What Happens to Review Quality as AI Volume Scales?

Consider a QA team reviewing AI outputs at scale — deviation summaries, CAPA drafts, SOP revisions, batch record analyses. The review queue grows at the pace of AI adoption. The time available for each review does not. One natural response is faster review: more efficient workflows, better trained reviewers, streamlined approval processes. Here is a question worth sitting with before assuming that addresses the underlying issue: if the reviewer cannot see the AI's reasoning, does review speed change what they can assess? If the constraint is access rather than time, then efficiency improvements may not change the compliance posture in the way they intend to. That is not a settled conclusion — it is a question worth working through before the answer is assumed.

One Natural Response Is to Add Approval Steps — Worth Examining

One natural response to the warning letter is to add approval steps — a second reviewer, a director sign-off, a committee review for CAPA closures. That response is worth examining rather than accepting at face value. More eyes on an output can certainly improve the chance that an error is caught. The harder question is whether more reviewers improves things if none of them have access to the reasoning that produced the output. A second signature on a document the second signatory is no better positioned to assess than the first may be something other than additional governance. Whether it constitutes substantive remediation in the sense 211.22(c) has always required is the question additional approval steps may eventually need to answer.

Imagine Two Possible Approaches to AI-Assisted Review

Not a before-and-after — more an exploration of what different architectures make possible, and what they leave open

The comparison below is not about companies that do things right versus companies that cut corners. Consider instead two manufacturers with credentialed quality departments, compliant workflows, and signed audit trails — differing only in what their architecture makes available to the reviewer. The differences described here are possibilities worth thinking through, not certainties about how any given inspection will unfold.

What the Reviewer Has Access To

Consider This Scenario

The QU representative receives the AI-generated output — a deviation summary, CAPA draft, or batch record analysis. They read it. They assess whether it appears correct and complete. They have no access to what the AI consulted, what historical cases it compared against, what alternative root causes it evaluated and dismissed, or where its reasoning was uncertain. Their review is bounded by the visible quality of the output. Whether that is an adequate basis for the attestation their signature implies is the question this architecture leaves open.

Output only — reasoning not available to the reviewer

Now Consider This

The QU representative receives the AI-generated output alongside a record of how it was produced: regulatory sources consulted, historical cases compared, alternative conclusions evaluated, gaps identified, and confidence levels by section. They can engage with both what the AI produced and how it arrived there. Whether that changes the nature of what their sign-off attests to is the philosophically interesting question — and quite possibly the practically important one.

Output plus reasoning — reviewer can assess both

What the Audit Trail Captures

Consider This Scenario

The audit trail records a user ID, a timestamp, and an approval outcome. The signature is present; the basis for the review may or may not be recoverable from anything the system captured. An inspector can verify that a signature exists and that it was applied within the SOP timeframe. Whether they can verify that a substantive review occurred — in the sense 211.22(c) has always required — depends on what was preserved, and what questions they ask.

Signature exists; basis for the review is not in the record

Now Consider This

The audit trail captures two distinct, independently retrievable records: the AI's work — what it was asked, what it accessed, what it produced, and what it identified as uncertain — and the human's decision — what they reviewed, what they assessed, and the explicit basis for their approval or required revision. Whether this architecture fully satisfies 211.22(c) for every possible AI use case is a question regulators are still working through. What it does do is make the review demonstrable rather than merely asserted.

Two records — AI contribution and human judgment separately auditable

What Happens During Inspection

Consider This Scenario

The investigator finds compliant signatures and timely reviews. When they ask the QU representative to walk through the basis for a root cause determination, the representative can describe the output they approved. Whether the investigator accepts that as a compliant review under 211.22(c) — or whether they read it differently — is the question this architecture leaves genuinely open. It is worth thinking through before assuming the answer is obvious.

Compliant signatures; open question about demonstrability of review

Now Consider This

The investigator can retrieve the complete decision chain: what the AI considered, what the QU representative reviewed, and the explicit basis for their approval. The QU representative can describe not only the conclusion but the reasoning they evaluated and the basis on which they found it sound. Whether this fully closes the question under 211.22(c) depends on regulatory interpretation that is still developing. It does at least make the accountability the standard requires something that can be shown rather than argued.

Review is demonstrable — open to inspection, not just assertion

How Errors Have a Chance to Surface

Consider This Scenario

If the AI produces an incorrect root cause assessment in professionally written language, the reviewer without reasoning access has limited means to detect the error. They can catch what looks wrong. A plausible but unsupported root cause, a missed regulatory requirement, an assumption that does not hold — none of these necessarily look wrong in the output. They may surface at inspection, in downstream quality events, or not at all. Whether that is an acceptable detection architecture for GMP use cases is a question worth working through rather than setting aside.

Detection depends on the output appearing wrong — reasoning errors may not

Now Consider This

If the AI produces an incorrect or incomplete output, the reasoning record offers a surface where the gap might originate — a regulatory reference misread, an alternative dismissed without adequate basis, a confidence assertion not warranted by the evidence. The QU representative reviewing the reasoning has a different opportunity to identify the failure point before the output governs a quality decision. Whether that constitutes a fundamentally different review — or simply a better-informed one — is a distinction worth thinking through.

Errors have a surface at the reasoning level, not only in the output

What Might Substantive AI Review Require?

Four principles worth thinking through — not as settled requirements, but as architectural questions the industry is beginning to work out

The ideas below are not a checklist. They are an attempt to think through what the architecture of AI-assisted review would need to provide if the standard it operates under has always required substantive accountability rather than procedural compliance. How those ideas translate into specific system requirements is something regulators, manufacturers, and AI developers are still working out together.

Reasoning Transparency as a Condition of Review

Consider what it would mean if AI outputs used in GMP activities were accompanied by retrievable reasoning records: what regulatory sources were consulted, what historical data was compared, what alternatives were evaluated, and where the AI's confidence was limited. One way to think about this is that a reviewer attesting to the accuracy of a conclusion they cannot trace is attesting to something different from a reviewer who can. Whether those two attestations are equivalent under 211.22(c) is the question reasoning transparency is designed to make answerable rather than arguable.

Reasoning recordAI transparencyReviewable output

Architecture That Connects GMP Expertise to AI Reasoning

Here is an interesting architectural puzzle worth sitting with: QA professionals reviewing AI outputs bring deep GMP expertise to the review. But that expertise cannot reach reasoning they cannot see. A deviation investigator who can see which historical cases the AI compared, and what it dismissed, can apply their quality judgment to that comparison. Without that access, their expertise operates only on the output — the conclusion — rather than the reasoning that produced it. One property an architecture might need, if the review is to be substantive, is a way to connect the reviewer's expertise to the AI's reasoning rather than only its results.

QU accountabilitySubstantive reviewGMP competence

The Dual Record Idea

One architectural principle worth thinking through is whether AI deployments in pharmaceutical manufacturing might benefit from two independently auditable records rather than one: the AI's contribution — what it was given, what it accessed, what it produced, what it flagged as uncertain — and the human's decision — what they reviewed, what they assessed, and the explicit basis for their approval or required revision. A single audit trail entry capturing a timestamp and approval outcome may assert that a substantive review occurred. Whether it can demonstrate it — in the sense an inspector can evaluate — is the question the dual record idea tries to address.

21 CFR Part 11AI work recordDecision traceability

What Qualification Means for AI-Assisted Review

There is a question worth sitting with about what it means to be qualified to review an AI output in a GMP context. GMP expertise without reasoning access produces a reviewer who is qualified to assess quality but cannot see what they are assessing. Reasoning access without GMP expertise produces a reviewer who can read the AI's logic but cannot evaluate its regulatory adequacy. One way to think about this is that both may be required — and that qualification frameworks for AI-assisted workflows may need to account for both in ways that current frameworks were not designed to address. That is not a criticism of any specific approach. It is a question the industry is in the early stages of working out.

Inspection readinessReviewer qualificationHuman-in-the-loop

What FDA’s first AI warning letter may be establishing — and it is worth being tentative here, because regulatory interpretation develops slowly and through specific cases — is that the agency is beginning to ask an old question in a new context. Walk me through the basis for this procedure. That question has always been latent in 211.22(c). What AI deployment does is change how often the honest answer to that question requires access to information the system may not have preserved.

The companies that receive FDA’s next AI-related enforcement actions will almost certainly not look like the first one. They will have quality units. They will have credentialed QA professionals who read the same warning letter and concluded, reasonably, that their situation was categorically different. Their approval workflows will show timely reviews. Their audit trails will show compliant signatures. Whether those audit trails can demonstrate the substantive review the standard has always required — when an inspector asks the right question — is something each organisation is implicitly deciding now, through the architectural choices embedded in how they deploy AI.

The distinction between approving an output and reviewing it is not new. It has always existed in pharmaceutical quality management. What AI does is make the gap between the two more consequential and harder to ignore. How the industry chooses to close that gap will say something about what “review” has always meant.

Reading a document and reviewing it have always been different things. In most of pharmaceutical quality management, the gap between the two was narrow enough to be workable. The interesting question AI poses is whether the architecture of AI-assisted workflows preserves that workability — or quietly widens a gap that was already there.

Exit