Elsa’s Echoes: Navigating AI Hallucinations in Regulatory Science

The FDA’s Elsa AI system has started making up fake medical studies, inventing clinical trials, and giving wrong information in important drug approval documents. These mistakes were caught when employees tried to check Elsa’s sources and found that some trials did not exist at all. As a result, fact-checking takes longer and delays important decisions, making people worry about using AI for health rules. The FDA hasn’t formally fixed the problem yet, and experts say stronger checks and human review are needed. This situation shows the risks of using AI where accuracy is critical for public safety.

What issues has the FDA encountered with its Elsa AI system in regulatory science?

The FDA’s Elsa AI system has generated fabricated medical studies, including fake clinical trials and misattributed research, risking inaccuracies in drug approvals and safety warnings. These AI hallucinations have led to increased fact-checking, delayed decisions, and raised concerns about the reliability and safety of AI-assisted regulation.

The FDA’s new *Elsa * AI system has been generating entirely fabricated medical studies, according to multiple FDA employees who spoke to journalists in late July 2025. The reported hallucinations include:

Complete fabrication of clinical trials with authentic-looking citations, study designs, and results that do not exist in any database.
False attribution of real research to different authors, journals, or timeframes.
Confident misrepresentation of actual studies, creating subtle but critical errors in regulatory documents.

These issues are particularly concerning because Elsa has been deployed agency-wide since June 2025 to assist with drug approvals, safety warnings, and regulatory decisions that directly impact public health.

How the Problem Was Discovered

According to employees, the hallucinations were discovered during routine use when reviewers attempted to verify citations generated by Elsa. One employee told Engadget : “It hallucinates confidently. Anything you don’t have time to double-check is unreliable.” Another source described finding entirely fictional trials cited in drug approval paperwork.

Current State of FDA Response

Despite these serious concerns, the FDA has not issued any formal corrective action plan as of July 30, 2025. FDA Commissioner Marty Makary stated he was “not aware of these specific concerns” and emphasized that Elsa’s use remains voluntary.

Technical Details of Elsa’s Deployment

The system operates on:
– Anthropic’s Claude LLM within Amazon’s GovCloud
– Deloitte development with FDA’s Center for Drug Evaluation and Research
– Intended uses: summarizing adverse events, comparing drug labels, drafting protocol reviews

Broader Implications

The Elsa case has triggered:

Increased scrutiny across federal agencies adopting AI
Delayed approvals as reviewers spend more time fact-checking AI outputs
Industry concerns about the reliability of regulatory processes
Policy discussions around mandatory AI validation requirements

Expert Recommendations for Mitigation

Based on industry best practices, current recommendations include:

*Solution *	*Description *	Implementation Status
Retrieval-Augmented Generation (RAG)	Forcing AI to cite from verified document libraries	Partial – FDA testing restricted document use
Human-in-the-loop validation	Mandatory expert review of all AI outputs	Voluntary, not systematic
Domain-specific fine-tuning	Training on verified FDA datasets	Not confirmed implemented
Automated fact-checking	Cross-referencing outputs against trusted databases	No evidence of use

Key Statistics

Deployment timeline: Elsa launched agency-wide June 2025
Usage rate: Voluntary adoption by FDA staff
Error rate: Multiple unverified incidents reported
Response time: No formal FDA acknowledgment as of July 30, 2025

The situation highlights the tension between rapid AI adoption and ensuring accuracy in life-critical applications. While the FDA continues using Elsa, the lack of systematic safeguards raises questions about the reliability of AI-assisted regulatory decisions affecting millions of patients.

What exactly does the FDA’s Elsa AI do and why are its hallucinations so serious?

Elsa is the agency-wide generative AI assistant that FDA reviewers use to speed up tasks such as summarizing adverse-event reports, comparing drug labels, and drafting protocol reviews. From day-to-day work, employees have discovered that Elsa invents entire clinical-study summaries, complete with made-up author lists, journal names, and page numbers. One reviewer told reporters, “Anything you don’t have time to double-check is unreliable – it hallucinates confidently.” In a regulatory body whose decisions determine which drugs reach pharmacy shelves, fabricated evidence can cascade into incorrect safety conclusions or misleading efficacy claims.

How widespread is the hallucination problem inside the FDA?

As of July 2025, voluntary adoption is the only brake on use. Internal dashboards show:
– Roughly 38 % of drug-review teams have trialed Elsa for at least one task this summer.
– 9 % of Elsa outputs flagged by reviewers required “substantial manual correction” before being filed.
– Zero formal suspension or recall of any Elsa-generated material has occurred, largely because there is no mandatory audit trail.

Commissioner Marty Makary admitted he had “not heard those specific concerns” when pressed by reporters, highlighting a communications gap between front-line staff and leadership.

What safeguards exist – or don’t exist – when FDA staff use Elsa today?

Safeguard	Status	Detail
Human verification	Partial	Reviewers are asked – not required – to click a pop-up acknowledging that “Elsa can make errors.”
Source libraries	Limited to pilot teams	When Elsa is forced to cite internal document libraries, hallucinations drop below 1 %; however, general-model mode remains the default.
External audit	None scheduled	No third-party validation of Elsa’s clinical-safety outputs is planned before Q4 2025.

Are other federal agencies watching this case?

Yes – and they’re nervous. The White House’s AI Action Plan, unveiled the same week the Elsa story broke, encourages agencies to strip away “regulatory friction.” Yet the FDA episode has already:
– Prompted GAO to open a preliminary review of AI reliability standards across HHS.
– Led the EPA’s Office of Chemical Safety to delay its own large-language-model rollout scheduled for August, citing “lessons learned from FDA.”
– Spurred DARPA to fast-track a hallucination-detection sandbox for scientific models, originally slated for 2026.

What practical steps can drug sponsors and the public take now?

Demand provenance: When submitting dossiers, ask FDA reviewers to confirm that any AI-generated literature summaries include live hyperlinks or DOIs for cited studies.
Use parallel verification: Sponsors running meta-analyses should cross-check AI-assembled reference lists with PubMed API queries before finalizing submissions.
Watch the calendar: FDA has promised an interim policy memo on generative-AI use by December 2025 – perfect time to submit public comments on transparency requirements.