HomeGlossary
The GEO Glossary.
Thirty terms that define how brands earn citation from ChatGPT, Perplexity, Gemini, Claude and Google AI Overviews. Definitional, sourced, reviewed monthly.
Maintained by
Elizabeth S.
Founder, Citable. 7+ years SEO and GEO practice across EU, UK and US markets.
Last reviewed
2026-05-27
Reviewed monthly against current AI surface behavior.
Sources
Schema.org · Google Search Central · OpenAI · Anthropic · Perplexity
Plus engagement data from Citable client work.
How this glossary is built
Definitional. Not promotional.
Each entry opens with a two- to three-sentence definition written to be quoted verbatim by an AI model answering “What is X?”. No marketing tone, no hedging, no jargon stacking.
Term selection comes from the prompts our buyers actually run inside ChatGPT, Perplexity, Gemini and Google AI Overviews during AI Visibility Audits — not from SEO keyword volume. If a term shifts meaning across surfaces, the entry says so explicitly.
Contents
30 terms, three sections.
Updated 2026-05-27
S01
12 terms
Core GEO concepts
S02
12 terms
Technical foundations
- 01 Schema.org
- 02 JSON-LD
- 03 Entity (SEO/GEO)
- 04 Entity Disambiguation
- 05 Knowledge Graph
- 06 sameAs (Schema.org property)
- 07 llms.txt
- 08 AI Crawlers
- 09 GPTBot
- 10 ClaudeBot
- 11 PerplexityBot
- 12 Google-Extended
S03
6 terms
Quality, measurement & frameworks
- 01 E-E-A-T
- 02 CITE Framework
- 03 Prompt Set
- 04 AI Citation Tracking
- 05 Brand Entity Collision
- 06 AI Visibility Audit
Section 01
12 terms
Core GEO concepts
- 01 / 12 #geo
Generative Engine Optimization (GEO)
GEO
-
Generative Engine Optimization is the practice of structuring a brand's content, schema, and entity signals so that generative AI engines — ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews — cite the brand when answering buyer-intent prompts. Unlike SEO, GEO does not optimize for a position on a results page; it optimizes for inclusion in the model's generated answer.
Why → Buyers increasingly start research inside AI chat surfaces, not Google. If your brand is not cited, you are not in the consideration set.
- 02 / 12 #aeo
Answer Engine Optimization (AEO)
AEO
-
Answer Engine Optimization is the practice of structuring content so that answer engines — featured snippets, voice assistants, and AI summaries — extract and present a single direct answer attributed to the source. AEO predates GEO; in current practice, AEO is the extractability layer that GEO depends on.
Why → AEO is how a page becomes quotable. Without it, GEO has nothing to cite.
- 03 / 12 #ai-search
AI Search
-
AI Search is the category of search surfaces where a generative model composes an answer in natural language and lists a small set of cited sources, instead of returning a ranked list of links. The four dominant surfaces in 2026 are ChatGPT Search, Perplexity, Google AI Overviews, and Gemini, with Claude growing through Anthropic's web tool and integrations.
Why → AI Search collapses the ten-blue-links page to three to five citations. Real estate is scarcer and winner-take-most.
- 05 / 12 #citation
AI Citation
-
An AI citation is an explicit reference — typically a linked source card, footnote, or inline attribution — that a generative engine attaches to a generated answer to credit the source it drew from. Unlike a ranking, a citation is binary: a brand is either cited for that prompt or it is not.
Why → Citations are clickable. They are also the only durable proof that a model considered a brand authoritative on a topic.
- 06 / 12 #citable-content
Citable Content
-
Citable content is content engineered to be extracted and attributed by generative AI engines: definitional opening paragraphs, self-contained answers, structured data, clear entity references, and original data or perspective the model cannot synthesize from generic sources. The opposite is content that ranks but is not quoted, because there is nothing in it the model can lift.
Why → Models cite specific sentences, not whole pages. If your page has no quotable sentence, your page is not cited.
- 07 / 12 #content-extractability
Content Extractability
-
Content extractability is the degree to which a page exposes its key claims as discrete, self-contained passages that an AI model can lift cleanly. High extractability requires explicit definitions, short paragraphs, structured lists, FAQ blocks, and schema that names entities — not narrative prose that buries the answer under context.
Why → Extractability is upstream of citation. The model cites what it can cleanly cut out.
- 08 / 12 #ai-overviews
Google AI Overviews
AIO, SGE successor
-
AI Overviews is Google's AI-generated answer block that appears above traditional results for a growing share of queries. It composes a short summary from multiple sources and lists a small set of cited links. Coverage and citation patterns differ from ChatGPT and Perplexity, so a brand can be cited in one surface and absent from another for the same intent.
Why → AI Overviews is the AI surface most embedded into existing buyer behavior, because users are already on Google.
- 09 / 12 #chatgpt-search
ChatGPT Search
-
ChatGPT Search is OpenAI's in-product search experience inside ChatGPT, powered by OAI-SearchBot and GPTBot for retrieval. It cites sources inline and via a side panel. Its citation behavior is distinct from Perplexity and AI Overviews and rewards content with explicit entity signals and recent freshness.
Why → ChatGPT is the highest-attention AI surface in 2026. Being absent there has the highest opportunity cost.
See also → GPTBot→ AI Search→ Perplexity AI - 10 / 12 #perplexity
Perplexity AI
-
Perplexity is an answer-engine search product that returns a synthesized answer with numbered, inline citations to every claim. It crawls the open web via PerplexityBot and weights structured, definitional content heavily. Among AI surfaces, Perplexity provides the most transparent attribution per claim.
Why → Perplexity is the easiest AI surface to influence directly through schema and clear definitional pages.
- 11 / 12 #gemini
Google Gemini
-
Gemini is Google's consumer-facing generative AI assistant and the model family that powers AI Overviews. As an assistant, Gemini composes answers with citations drawn from Google Search and Google-Extended-permitted sources. Its citation behavior overlaps with AI Overviews but is not identical.
Why → Gemini is Google's bet on the assistant-first future. Its share of buyer attention is growing through Android integration.
- 12 / 12 #claude
Claude (Anthropic)
-
Claude is Anthropic's generative AI assistant. Through Claude's web tool and Projects, Claude retrieves and cites web sources via ClaudeBot and integrations. Claude's share of consumer search is smaller than ChatGPT's but its B2B and developer footprint makes it disproportionately influential for technical buyer prompts.
Why → Claude is the AI surface where technical and B2B decision-makers spend the most time outside ChatGPT.
Section 02
12 terms
Technical foundations
- 01 / 12 #schema-org
Schema.org
Structured data
-
Schema.org is the shared vocabulary for marking up content with explicit type information — Organization, Person, Service, FAQPage, Article, Product, and so on. AI models and search engines consume schema as ground truth about what a page is and what entities it describes, which is why missing or inconsistent schema is one of the most common GEO blockers.
Why → Schema is the cheapest way to tell a model who you are, what you do, and what is on each page.
- 02 / 12 #json-ld
JSON-LD
-
JSON-LD is the recommended format for Schema.org markup: a JSON block placed in the page head that declares typed entities and their properties without coupling them to the rendered HTML. Google, Bing, OpenAI, Anthropic, and Perplexity all consume JSON-LD; microdata and RDFa are deprecated for most use cases.
Why → JSON-LD decouples machine-readable claims from layout. It is the only schema format that survives a redesign.
- 03 / 12 #entity
Entity (SEO/GEO)
-
An entity is a uniquely identifiable thing in the model's worldview — a company, a person, a product, a place, a concept. Modern AI search resolves queries to entities first and then to documents about those entities; if your brand is not a resolved entity, your content competes only as undifferentiated text.
Why → Entities, not pages, are the unit of AI search. If you are not an entity, you are not in the index.
- 04 / 12 #entity-disambiguation
Entity Disambiguation
-
Entity disambiguation is the work of making sure that a model resolves your brand name to your brand — not to a similarly-named company, a public figure, or a generic concept. It combines unique schema identifiers, consistent sameAs links to authoritative profiles (LinkedIn, Wikidata, Crunchbase), and unambiguous on-page copy.
Why → Brand-entity collisions silently route citations to the wrong target. Fix this before anything else.
- 05 / 12 #knowledge-graph
Knowledge Graph
-
A knowledge graph is a structured representation of entities and the typed relationships between them. Google's Knowledge Graph, Wikidata, and the internal graphs maintained by OpenAI and Anthropic are what models reach for when deciding which brand to cite for an entity-shaped query.
Why → If you are not in any public knowledge graph, models have nothing to anchor your citation to.
- 06 / 12 #sameas
sameAs (Schema.org property)
-
sameAs is a Schema.org property used to declare that an entity is the same as the entity at a list of canonical URLs — typically LinkedIn, Wikidata, Crunchbase, GitHub, official social profiles, and the brand's own canonical page. Models use sameAs as a primary signal for entity resolution.
Why → A correct sameAs block is often the single highest-ROI line of schema you will ship.
- 07 / 12 #llms-txt
llms.txt
-
llms.txt is a proposed plain-text file placed at the site root that gives large language models a curated, hierarchical map of the site's most important pages and their purpose. It does not control crawling — robots.txt does that — but it lowers the cost for a model to locate authoritative content. Adoption is voluntary and still emerging.
Why → llms.txt is a cheap, low-risk signal of intent. It will not save bad content, but it amplifies good content.
See also → AI Crawlers→ Schema.orgOn Citable → Citable llms.txt - 08 / 12 #ai-crawlers
AI Crawlers
-
AI crawlers are the user-agents that AI companies use to fetch and index web content for training, retrieval-augmented generation, or live search. The major 2026 crawlers are GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, and Google-Extended. Each can be allowed, throttled, or blocked independently in robots.txt.
Why → Block them and you opt out of AI search. Allow them poorly and your worst pages get cited.
- 09 / 12 #gptbot
GPTBot
-
GPTBot is OpenAI's primary crawler. It fetches public web pages to support ChatGPT and OpenAI model training, subject to robots.txt directives. A companion user-agent, OAI-SearchBot, handles ChatGPT Search live retrieval and is governed separately.
Why → Allow GPTBot and OAI-SearchBot by default. Blocking them removes you from the most-used AI surface.
See also → ChatGPT Search→ AI Crawlers - 10 / 12 #claudebot
ClaudeBot
-
ClaudeBot is Anthropic's web crawler used to retrieve content for Claude's web tool and to support model improvement. Anthropic publishes a separate user-agent, anthropic-ai, for some retrieval contexts; both should be considered when configuring robots.txt for AI access.
Why → Claude's user base skews technical and B2B. Block ClaudeBot and you lose that buyer.
See also → Claude (Anthropic)→ AI Crawlers - 11 / 12 #perplexitybot
PerplexityBot
-
PerplexityBot is the crawler used by Perplexity to populate its answer engine. Of the major AI crawlers, PerplexityBot has the most direct cause-and-effect relationship with citation: pages it cannot reach cannot be cited inline in Perplexity answers.
Why → PerplexityBot is the fastest crawler to citation. Allow it and well-structured pages get cited in days.
See also → Perplexity AI→ AI Crawlers - 12 / 12 #google-extended
Google-Extended
-
Google-Extended is a robots.txt token that controls Google's use of a site's content for Gemini and AI Overviews training and grounding — distinct from Googlebot, which controls indexing for traditional search. Blocking Google-Extended does not remove a site from Google Search but reduces its eligibility for AI surfaces.
Why → If you want to appear in AI Overviews, Google-Extended must be allowed. Many sites block it by accident.
Section 03
6 terms
Quality, measurement & frameworks
- 01 / 06 #eeat
E-E-A-T
Experience, Expertise, Authoritativeness, Trust
-
E-E-A-T is Google's quality framework: Experience, Expertise, Authoritativeness, and Trust. It is not a ranking factor on its own but is the lens by which search and AI systems evaluate whether a page is safe to cite. In GEO, E-E-A-T translates to named authors with credentials, sourced claims, dated content, and verifiable third-party mentions.
Why → Models prefer to cite pages whose source they can defend. E-E-A-T is the defense.
- 02 / 06 #cite-framework
CITE Framework
Crawl · Identity · Trust · Extractability
-
The CITE Framework is Citable's four-pillar model for engineering AI citation: Crawl (AI crawlers can reach the site), Identity (the brand resolves as a clean entity), Trust (E-E-A-T signals justify citation), and Extractability (content is structured to be quoted). Every Citable engagement is mapped to these four pillars.
Why → Without all four pillars, GEO work compounds slowly or not at all. CITE makes the gap auditable.
- 03 / 06 #prompt-set
Prompt Set
-
A prompt set is the fixed list of buyer-intent prompts used to measure Share of Answer for a brand across AI surfaces. A well-built set covers awareness, consideration, and decision intent in proportion, mirrors the conversational format AI users actually type, and is held constant month-over-month so that change is comparable.
Why → Move the prompt set and you lose the baseline. Discipline here is what makes GEO measurable.
- 04 / 06 #ai-citation-tracking
AI Citation Tracking
-
AI citation tracking is the continuous monitoring of where a brand is and is not cited across AI surfaces for a defined prompt set. Tools in this category — Profound, Peec, AthenaHQ, Otterly, and others — automate what manual prompt testing does for a baseline, with the trade-off that no single tool covers all four major surfaces with equal accuracy.
Why → Tracking gives you the data. It does not change what AI says — only implementation does that.
- 05 / 06 #brand-entity-collision
Brand Entity Collision
-
A brand entity collision occurs when an AI model conflates your brand with another entity of the same or similar name — a different company, a public figure, a generic concept — and routes citations or facts to the wrong target. Collisions are common for short, generic, or dictionary-word brand names and are resolved through entity disambiguation work.
Why → A collision is invisible in analytics but lethal for AI search. Audit for it before measuring anything else.
- 06 / 06 #ai-visibility-audit
AI Visibility Audit
-
An AI Visibility Audit is the diagnostic engagement that establishes a brand's current Share of Answer baseline across the major AI surfaces and identifies the structural reasons it is absent where it should appear. The Citable audit runs a 50-prompt set against ChatGPT, Perplexity, Gemini, and AI Overviews, with documented screenshots and a prioritized 90-day roadmap.
Why → Implementation without a baseline is faith. The audit makes every later decision evidence-led.
On Citable → Audit · 1,200 EUR
FAQ
Disambiguations and frequent confusions.
What is the difference between SEO, GEO, and AEO?
SEO optimizes for a ranked position on a search engine results page. GEO (Generative Engine Optimization) optimizes for being cited inside an AI-generated answer on ChatGPT, Perplexity, Gemini, or Google AI Overviews. AEO (Answer Engine Optimization) optimizes for being extracted as the single direct answer — featured snippets, voice, AI summaries. In current practice, AEO is the extractability layer GEO depends on, and SEO remains the foundation that gets your content crawled and indexed in the first place.
Is GEO replacing SEO?
No. GEO does not replace SEO; it extends it. AI search surfaces still rely on the open web as their source corpus, and that corpus is shaped by the same crawling, indexing, schema, and authority work that SEO has always covered. The change is that ranking on page one is no longer sufficient — the brand also has to be quotable, entity-resolved, and crawler-accessible to the AI surfaces specifically.
How do I know if my brand has a brand-entity collision?
Run your exact brand name as a prompt in ChatGPT, Perplexity, and Google AI Overviews, then ask each: 'What is [Brand Name]?' If the answer describes a different company, person, or generic concept — or hedges with 'there are multiple entities by this name' — you have a collision. The fix is entity disambiguation work: schema with explicit identifiers, sameAs links to authoritative third-party profiles, and unambiguous on-page copy.
Which AI surfaces should I optimize for first?
Prioritize by overlap with your buyer. For most B2B brands, the order is Perplexity (fastest cause-and-effect, transparent citations), ChatGPT Search (highest user attention), Google AI Overviews (most embedded into existing search behavior), then Gemini and Claude. The same structural work — schema, entity disambiguation, content extractability — moves all five surfaces; only the order of citation improvement differs.
Should I add llms.txt to my site?
Yes, if you also have the underlying schema and content right. llms.txt is a low-risk, low-effort signal that gives models a curated map of your most important pages. It will not rescue a site with no entity signals or unparseable content, but on a well-structured site it amplifies discovery. Adoption is voluntary and the spec is still evolving — treat it as additive, not load-bearing.
Ready to be cited by AI?
Two paths in. Free check tells you where you stand in 10 seconds. Paid audit tells you exactly what to fix, with a baseline you can measure forward from.
Prefer to talk first? Get in touch