10 Common LLM Hallucination Patterns and How to Catch Them

Top 10 March 12, 2025 · 6 min read

Hallucinations aren't random — they follow recognizable patterns. Once you know what to look for, you can build detection rules, train reviewers to spot them faster, and systematically reduce the error rate in your AI outputs.

Here are the ten most common hallucination patterns we see across production deployments, ranked by frequency.

1. Fabricated Citations

The model references a study, paper, or report that doesn't exist. The citation looks real — proper formatting, plausible journal name, realistic authors — but the source is entirely fictional. This is the most common hallucination pattern, appearing in roughly 12% of research-oriented outputs.

How to catch it: Require reviewers to verify at least one citation per output against the original source. Automated tools can check DOI existence, but human judgment is needed to confirm the source actually says what the model claims.

2. Numeric Inaccuracy

The model states that a company had $2.3 billion in revenue when the actual figure was $2.8 billion. Or it claims a study found a 47% improvement when the real number was 42%. The numbers are close enough to feel right but wrong enough to matter.

How to catch it: Flag any output containing specific numbers for automated fact-checking against known databases. For numbers not in databases, human verification is essential.

3. Attribute Misassignment

The model correctly identifies a fact but attributes it to the wrong entity. "Google acquired Slack" instead of "Salesforce acquired Slack." The fact pattern is real; the attribution is wrong.

How to catch it: Reviewers should spot-check entity-relationship pairs, especially for well-known companies, people, and events. These errors are fast to catch because the correct information is usually obvious.

4. Temporal Confusion

The model states that something happened in 2024 when it actually happened in 2023, or claims an event is upcoming when it already occurred. Dates and timeframes are particularly error-prone because they require precise knowledge that the model may have only partially encoded.

How to catch it: Any output referencing specific dates or timeframes should be verified. Build a list of key dates for your domain and cross-reference automatically where possible.

5. Plausible but False Technical Claims

The model generates technically sophisticated statements that are subtly wrong. "PostgreSQL uses MVCC with snapshot isolation" is true. "PostgreSQL uses MVCC with serializable isolation by default" is false — it defaults to read committed. These are the hardest hallucinations to catch because they sound expert.

How to catch it: Domain experts are essential here. Automated checks can catch surface-level errors, but subtle technical inaccuracies require someone who actually knows the subject matter.

6. Invented Statistics

The model generates a specific statistic — "73% of enterprises" or "4.2x improvement" — that sounds authoritative but has no source. Unlike numeric inaccuracy (where a real number is distorted), invented statistics are created from whole cloth.

How to catch it: Any output containing a specific percentage or multiplier needs a source. If no source is provided, flag it. If a source is provided, verify it.

7. Overgeneralization

The model takes a specific finding and applies it too broadly. A study about GPT-4's performance on medical licensing exams becomes "LLMs outperform doctors on medical diagnostics." The initial fact is real; the conclusion is unsupported.

How to catch it: Reviewers should check that conclusions are proportionate to the evidence cited. This requires reading comprehension more than fact-checking — a different skill set.

8. Confident Hedging

The model says "it is well established that" or "research clearly shows" about claims that are actually contested, preliminary, or unverified. The confidence language masks genuine uncertainty.

How to catch it: Train reviewers to flag certainty language. Any claim described as "well established" or "proven" should be verifiable through multiple independent sources.

9. Outline Hallucination

The model generates a structure — table of contents, numbered list, section headers — that promises content it doesn't deliver. The outline looks comprehensive, but the actual content is thin or missing.

How to catch it: Compare the outline to the content. Does each section deliver on its promise? This is a quick human check that automated tools struggle with.

10. Context Window Overflow

When inputs are long, models quietly drop details from earlier in the context. A summary of a 5-page document omits the third page entirely. The output reads fine but is materially incomplete.

How to catch it: For long inputs, require reviewers to verify that all key points are represented in the output. Checklists work well here — list the expected topics and confirm each one appears.

Building Detection Into Your Workflow

You don't need to catch all ten patterns every time. Instead:

Categorize your outputs — which patterns are most likely for your use case?
Train reviewers on your top 3 — specialization beats generalization for review quality
Automate what you can — citation checking, numeric verification, and date validation can be partially automated
Track patterns over time — if fabricated citations spike after a model update, you want to know immediately

Understanding hallucination patterns turns a vague anxiety about AI quality into a concrete, actionable review process.

Automated hallucination detection starts here

Run your AI outputs through our review pipeline and see what your automated checks miss.

Start free trial →