AI Governance · Professional Responsibility · Commentary

Governance, Not Guardrails

Anthropic announced a three-layer guardrail stack to prevent hallucinated citations. When the bar opens a grievance, it will not ask what the guardrails did. It will ask what you did.

What Anthropic Said About Hallucinations

On May 16, 2026, Anthropic ran its second Claude for Legal training webinar, led by Mark Pike, Associate General Counsel, and Harry Liu, Applied AI. The session was substantive. The Q&A addressed hallucinations directly. Pike described a three-layer mitigation stack inside Claude.

One. A cold-start interview that captures firm context up front, so Claude has the practice profile before it generates work product.

Two. Source citations on each cell of structured output, with a flag rather than a fabrication when verification fails.

Three. Guardrails that alert the lawyer when Claude is making assumptions, so the lawyer can manually verify before the work leaves the firm.

Pike framed the goal directly: “We want to make sure you’re not the next lawyer who ends up on the naughty list of people who submitted hallucinated citations in a courtroom filing.”

The goal is correct. The mechanism is a guardrail system. It is not a governance program. The two protect different things.

The Distinction

Guardrails are vendor-built. They sit inside the tool. They change when the model updates. They are opaque to you, to your ethics counsel, and to any grievance committee. When they work, you do not see them work. When they fail, the failure shows up in your filing, not on the vendor’s status page.

Governance is firm-built. It sits around the tool. It is documented, versioned, dated, and auditable. It defines what your firm does before Claude touches a matter and after the output leaves Claude. It is the artifact your bar will recognize as evidence that you took your competence, confidentiality, and supervision obligations seriously.

Both have value. Neither substitutes for the other. Buying a tool with sophisticated guardrails does not produce a governance program any more than buying a malpractice policy produces a calendaring system.

Three Layers Inside One Model Is Still One Auditor

Every one of Pike’s three mitigation layers runs inside Claude. Claude collects the firm context. Claude generates the citation. Claude flags its own assumptions. That is one auditor running three sub-checks, not three independent checks. The structural argument is laid out in detail in “You Don’t Ask the Liar If He’s Lying.” The short version: a model that produces work cannot reliably audit that work, because the same training data, the same inference patterns, and the same blind spots that shaped the output also shape the audit.

The Sullivan & Cromwell hallucination matter is the cautionary case in commercial practice. A sophisticated firm with internal review still filed AI-generated citations that did not exist. Internal review failed not because the lawyers were careless, but because the review sat downstream of the same workflow that produced the error. Independent verification is the only kind that catches that class of mistake.

The point applies with equal force to a one-attorney firm relying on Claude’s own guardrails to flag Claude’s own assumptions.

The Empirical Record

This is not a hypothetical concern. A research team at Stanford and Yale (Magesh, Surani, Dahl, Suzgun, Manning, and Ho) conducted the first pre-registered empirical evaluation of leading legal AI products, released as a preprint in May 2024 and published, after peer review, in the Journal of Empirical Legal Studies in 2025: “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,” 22 J. Empir. Leg. Stud. 216 (2025). The study tested Lexis+ AI, Thomson Reuters’s Ask Practical Law AI, Westlaw’s AI-Assisted Research, and GPT-4 against 202 hand-constructed legal queries.

LexisNexis had marketed Lexis+ AI as delivering “100% hallucination-free linked legal citations.” Thomson Reuters described its system as one that avoids hallucinations by relying on trusted Westlaw content. Stanford measured what actually happened.

Lexis+ AI hallucinated on 17% of queries. Westlaw AI-Assisted Research hallucinated on 33%. Ask Practical Law AI, which draws only from Thomson Reuters’s curated practice-guide database, hallucinated on 19% and refused to answer 62% of the time, producing accurate responses on just 18% of queries. The study’s conclusion: claims of hallucination-free legal AI systems are, at best, ungrounded.

Two findings deserve direct attention here.

First, the Stanford team explicitly addressed the auditor problem. They considered and rejected the methodology of letting legal AI tools check themselves, noting that the technique is unsuitable for legal evaluation precisely because adherence to authority is so important in legal writing and research. The same logic applies to legal production. A model that checks its own assumptions is not performing an independent check.

Second, the study documented that vendor marketing language and empirical reality were not aligned. Each vendor had publicly described its system as hallucination-free or nearly so. Each vendor’s system, when measured, hallucinated at rates between 17% and 33%. The marketing claim and the measurement did not match.

A March 2025 randomized controlled trial by Schwarcz, Manning, Prescott, and colleagues sharpens this picture rather than contradicting it. The study, “AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice,” assigned 137 upper-level law students at the University of Minnesota and the University of Michigan to complete six realistic legal tasks using either a RAG-based legal AI tool (Vincent AI), an AI reasoning model without RAG (OpenAI’s o1-preview), or no AI. Both AI tools produced meaningful quality and productivity gains over the no-AI baseline. Vincent AI did not introduce additional hallucinations. The reasoning model, used without RAG, did. Tools vary, architectures vary, task type matters, and the hallucination rate depends on choices the firm makes about which tool to use for which task. Vendor guardrails do not make those choices for the firm.

Anthropic’s Claude for Legal launch comes with comparable mitigation language. The plugins are new, and no independent benchmarking has been published yet. The reasonable prior, based on the empirical record from sophisticated RAG systems with proprietary verification layers, is that the hallucination rate will not be zero. The reasonable governance posture is to plan for that.

Who the Bar Actually Asks

When the Florida Bar opens a grievance, it does not subpoena Anthropic. It asks the attorney for documentation.

Under Rule 4-1.1, it asks whether the attorney maintained competence in the use of the tool, including its limitations.

Under Rule 4-1.6, it asks whether confidential client information was protected in transit and at rest, and whether the vendor’s data handling terms were reviewed and acceptable.

Under Rule 4-5.3, it asks whether non-lawyer staff and contractors using the tool were properly supervised, with written protocols and training records.

Florida Bar Ethics Opinion 24-1 places the obligation to understand the tool and its data flows on the lawyer. ABA Formal Opinion 512 reaches the same conclusion under the Model Rules. Florida Supreme Court SC2024-0032, effective October 28, 2024, codifies the supervisory expectation at the rule level. The 11th and 17th Circuit standing orders on AI disclosure, in effect since January 2026, add filing-level certifications. None of these authorities ask what the vendor’s guardrails did. All of them ask what the lawyer did.

“The guardrail did not catch it” is not a defense. “Here is my written policy, here is my supervision protocol, here is my independent verification step, here is my training log” is a defense.

What Governance Looks Like for a Small Firm

A working governance program for a one-to-ten-attorney firm does not require an IT department or a six-figure compliance budget. It requires the following, written down, dated, and reviewable on demand:

1 · AI Use Policy

Specifies which tools are approved, which tasks are permitted, and which are prohibited, including a clear rule on client-identifiable data into consumer-tier products.

2 · Independent Verification

If Claude drafted the brief, something other than Claude verifies the citations against primary sources. CourtListener, Westlaw, Lexis, or human review. Not a second Claude session inside the same vendor’s stack.

3 · Supervision Protocol (Rule 5.3)

Covers attorneys, paralegals, and contractors who use AI on firm matters, with documented training and periodic refresh.

4 · Competence Record (Rule 1.1)

Documents each attorney’s training, ongoing education, and the basis for the judgment that they are competent to use the tool for the task at hand.

5 · Confidentiality Safeguard (Rule 1.6)

Covers vendor data handling terms, retention, incident response, and the firm’s evaluation that the vendor’s terms are acceptable for the data being processed.

6 · Audit Trail

Records which tool was used, for what matter, when, by whom, and what verification step was applied before the work product left the firm.

These artifacts are what your bar will ask for if a grievance lands. They are not what Anthropic provides. Anthropic provides the tool. The governance program is yours to build.

The Bottom Line

Anthropic has built sophisticated infrastructure. The plugins, the connectors, the managed agents, and the Microsoft 365 integration are substantive, and lawyers will get real work done with them. The guardrails will catch errors that would otherwise reach a filing. None of that changes the structural fact that the bar regulates lawyers, not vendors. The tool comes with guardrails. The governance program has to come from you.

If your firm has not yet written its AI use policy, its supervision protocol, and its independent verification procedure, the question is not whether Anthropic’s guardrails are good enough. The question is what you will hand the grievance committee if they ask.

For the deeper structural argument on why a model that produces work cannot reliably audit its own work, read “You Don’t Ask the Liar If He’s Lying.”

Nothing in this article constitutes legal advice. Jurisdiction-specific ethics analysis is required before relying on any practice described here.

JDAI helps law firms develop AI governance frameworks - from policy drafting and tool evaluation through attorney training and ongoing compliance support.

Take the AI Readiness Assessment Schedule a Consultation

← Previous Article
Share LinkedIn Email
All Articles →