Crawl access
Optimised directives and clean visibility for the leading AI crawlers — so models can actually reach your content in the first place.
robots.txt · llms.txt · render · sitemapHomeMethodologyThe CITE Framework
Four pillars — Crawl access, Identity, Trust signals, Extractability — that determine whether ChatGPT, Perplexity, Gemini, and Google AI Overviews cite your brand inside their answers. The methodology behind every Citable engagement, published openly.
Reading time · 8 min · Updated 2026-06
Optimised directives and clean visibility for the leading AI crawlers — so models can actually reach your content in the first place.
robots.txt · llms.txt · render · sitemapVerifiable structured data and entity grounding across knowledge graphs, so a model knows exactly which brand you are — and cites you, not a namesake.
schema · Wikidata · entity graphThird-party citations, authority and auditable compliance — the trust boundary that makes a model confident enough to name you.
citations · authority · complianceAI-readable assets and answer-first lead-paragraph rewriting, so any page yields a clean, accurate, quotable answer a model can lift.
answer-first · structured · quotableDetail
Each pillar has its own definition, its own measurement framework, and its own implementation playbook. The order matters: C gates everything; E is the highest-leverage but slowest pillar.
Pillar 01
AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider — can reach your content without robots.txt blocks, soft-paywalls, or unsolved JS-render walls.
If the crawler cannot fetch you, you do not exist to the engine. This is the first eligibility gate. Most failures here are accidental: a robots.txt copied from a CMS template, a Cloudflare rule blocking unknown agents, a JS-only site without static fallback.
What we measure
What we ship
Pillar 02
AI engines unambiguously identify your brand as a distinct entity — separate from competitors with similar names, separate from generic terms, with verifiable structural attributes.
Without entity identity, AI engines err on the side of not citing. Two brands named 'Apex' both lose. The brand with a clean schema graph, a Wikidata Q-ID, and a Google Knowledge Graph entry wins the toss-up because there is no ambiguity to resolve.
What we measure
What we ship
Pillar 03
Independent third-party authorities reinforce your entity claims. AI engines weight third-party signals heavily because self-claimed signals can be manipulated.
Self-attested authority is cheap. Third-party authority is expensive — and that scarcity is what makes it credible. A brand cited in Wikipedia, indexed in Wikidata, profiled by reputable press, and linked from the open knowledge graph compounds trust the way a good credit history compounds borrowing power.
What we measure
What we ship
Pillar 04
Your content is shaped so AI engines can lift it directly into answers. Reachable, identified, and trusted content still gets skipped if it is not extractable.
AI engines do not paraphrase well. They lift. A page that bundles a clean definition into a single paragraph beats a page that buries the same definition inside marketing language. FAQ schema, HowTo schema, semantic HTML, density of definitional sentences — these are what make a page liftable.
What we measure
What we ship
Mapping
The free heuristic checker scores ten structural signals. Each maps to a CITE pillar. The seventh dimension — Extractability — requires running real prompts and is the core of the paid audit.
| Checker dimension | CITE pillar | Coverage |
|---|---|---|
| AI crawler access | C | Direct |
| llms.txt presence | C | Direct |
| Schema markup | I | Direct |
| Google Knowledge Graph | I | Direct |
| Wikipedia presence | T | Direct |
| Wikidata sameAs | T | Identity + Trust |
| Extractability | E | Paid audit only — heuristic checker scores 6 of 7 dimensions; E requires running real prompts at scale |
Methodology
The framework is the content; the methodology is the cadence. Every Citable engagement runs the same three phases — and CITE pillars are how we score, prioritize, and report inside each phase.
We run 50 prompts × 4 AI engines and score each against all four CITE pillars. The output is a baseline matrix: which pillars are weakest, which prompts you are missing, which competitors win the toss-up.
Implementation sequenced by pillar weight × effort. C and I are usually shippable inside a 3-month sprint. T compounds over 6–12 months. E is iterative and continues for as long as new content ships.
Monthly re-checks track CITE delta. Every shipped fix is attributable to a pillar score change. No vanity metrics, no SEO theater — every percentage point of Share-of-Answer growth is mapped back to a concrete CITE intervention.
FAQ
We coined the name and the structure. The four dimensions themselves emerged from observing how ChatGPT, Perplexity, Gemini, and AI Overviews select citations across 180+ engagements. The framework is published openly here — anyone can use it. We just ask you to credit Citable when you do, and link back to /framework if you reference it in your own work.
SEO ranks links in a results list. CITE optimizes for being cited inside an AI-synthesized answer. The technical primitives overlap (schema, crawl, content quality), but the success metric is fundamentally different: SEO measures position; CITE measures Share-of-Answer per prompt across engines. Many sites that win SEO lose CITE because they optimize for keywords rather than entity identity and extractability.
We tested eleven candidate dimensions across 180+ engagements. Four clustered cleanly with no significant overlap. The others (page authority, content freshness, internal linking, etc.) turned out to be either subsets of an existing pillar or downstream effects of getting CITE right. Occam's razor — when in doubt, fewer pillars.
Partially. The C and most of I and T are visible in our free heuristic checker (ten structural checks, runs in 10 seconds, no email). The E pillar — extractability — requires running real prompts against AI engines at scale, observing which content gets lifted, and scoring extractability per page. That is what the paid audit does.
The mechanics inside each pillar evolve. The pillars themselves do not. C, I, T, E are first-principles requirements — any AI engine that retrieves and synthesizes information needs all four. New retrieval architectures will change which signals matter inside each pillar, but the pillar structure has held up across two years of model updates.
Reference this page (/framework) and the methodology page (/methodology). For deeper engagements, the paid audit produces a per-pillar scorecard for your specific domain. For category-defining or analyst work, contact us — we share aggregate data on CITE distributions across SaaS, fintech, e-commerce, and prosumer verticals.
Two paths in. Free check tells you where you stand in 10 seconds. Paid audit tells you exactly what to fix, with a baseline you can measure forward from.
Prefer to talk first? Get in touch