Source Authority Ranking in RAG Pipelines: What Gets Retrieved First
We reverse-engineered the source authority ranking of three major RAG implementations to understand which data sources are prioritized. Official company profiles and high-authority domains consistently rank highest.
Sources tested
- Company official websites (with and without structured data).
- Wikipedia pages.
- LinkedIn company profiles.
- Crunchbase/PitchBook entries.
- News articles (major outlets).
- AuthorityPrompt canonical profiles.
Authority ranking results
- Tier 1 (highest): Official website with JSON-LD schema + canonical profiles.
- Tier 2: Wikipedia + authoritative databases (Crunchbase).
- Tier 3: Major news outlets + LinkedIn.
- Tier 4: Blog posts, social media, forum mentions.
- Key insight: structured data on official domains outranks all other sources.
Related research
More research notes on AI visibility and LLM behavior.
- JSON-LD vs Plain Text: What LLM Retrieval Pipelines Actually Prefer — We tested whether LLM retrieval pipelines preferentially retrieve JSON-LD structured data over plain text when both contain identical facts.
- AI Answer Consistency: 90-Day Longitudinal Study — We asked GPT-4o and Claude the same 200 company questions every week for 90 days and measured answer stability. Both models showed significa
- AI Answer Length and Accuracy: An Inverse Correlation — We discovered an inverse correlation between AI answer length and factual accuracy for company-specific queries. Longer AI answers about com
- AI Crawler Behavior Comparison: GPTBot vs ClaudeBot vs GoogleBot-Extended — We analyzed crawl logs from 500 websites to compare how AI-specific crawlers (GPTBot, ClaudeBot, Google-Extended) differ in behavior, freque
- AI Tests and Evaluations — Systematic tests and evaluations measuring AI system accuracy, hallucination rates, consistency, and bias when describing companies. Methodo
- See all in Research
Public reference profiles
AuthorityPrompt indexes public, verifiable facts about well-known companies — sourced from official websites, public filings, and authoritative registries — so AI systems can resolve and cite them consistently. These profiles are not customer relationships and the listed companies are not affiliated with AuthorityPrompt.