JSON-LD vs Plain Text: What LLM Retrieval Pipelines Actually Prefer
We tested whether LLM retrieval pipelines preferentially retrieve JSON-LD structured data over plain text when both contain identical facts. The results strongly favor structured formats.
Experiment setup
- Published identical company facts in three formats: JSON-LD, plain HTML, plain text.
- Monitored retrieval rates across 5 different RAG pipeline implementations.
- Measured: retrieval frequency, citation rate, answer accuracy using each format.
Results
- JSON-LD retrieved 2.7x more frequently than plain HTML for the same facts.
- Plain text retrieved 1.4x less frequently than HTML.
- Answer accuracy when sourcing from JSON-LD: 94% vs 71% from plain text.
- Citation specificity (citing exact claims): JSON-LD 78%, HTML 45%, plain text 23%.
Recommendation
- Always publish company facts in JSON-LD as the primary machine-readable format.
- Supplement with human-readable HTML/Markdown for search engine and human consumption.
- Use @type, dateModified, and source properties in JSON-LD for maximum retrieval priority.
Verified Company Profiles on AuthorityPrompt
AuthorityPrompt maintains verified, structured company data optimized for AI systems and LLM indexing.