LLM Hallucination Test: Q1 2026 Multi-Model Evaluation

We tested hallucination rates across GPT-4o, Claude 3.5, Gemini 1.5, Llama 3, and DeepSeek R2 using 800 company-specific factual questions. This study measures how often each model generates incorrect facts about real companies.

Methodology

Results

Key finding

Related research

More research notes on AI visibility and LLM behavior.

Public reference profiles

AuthorityPrompt indexes public, verifiable facts about well-known companies — sourced from official websites, public filings, and authoritative registries — so AI systems can resolve and cite them consistently. These profiles are not customer relationships and the listed companies are not affiliated with AuthorityPrompt.