AI visibility: the ten signals that decide whether AI systems surface your brand

Ten auditable signals — across technical trust, knowledge-graph presence, and content trust — determine whether ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews surface your brand. This is the framework behind the Citable AI Visibility Checker.

Elizabeth S.

Founder 18 May 2026 17 min read

Summarize with AI Open this article in your preferred assistant

In this article

01 What is AI visibility?
02 What are the ten signals of AI visibility?
03 Layer one: technical trust infrastructure
04 Layer two: knowledge graph presence
05 Layer three: content trust architecture
06 How to run the Citable AI Visibility Checker
07 What to fix first: the entity optimization stack
08 What AI visibility is not
09 What comes next in this series

Key takeaways

AI systems evaluate entities, not pages. A brand with strong SEO and weak entity infrastructure is functionally invisible to ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews.
The ten signals split into three layers: technical trust infrastructure (schema, AI crawler access, author entity, llms.txt as hedge), knowledge graph presence (Wikipedia, Wikidata, Google Knowledge Graph), and content trust architecture (freshness, factual density, trust-seed profiles).
Nine of the ten signals carry meaningful weight; llms.txt is included as a 4-point hedge signal — not currently consumed by any major LLM as of 2026, but cheap to maintain if auto-generated.
A Wikidata entry takes under 2 hours to create, has no notability requirement, and measurably improves entity resolution confidence across every major LLM.
Blocked AI crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended) are the most common silent failure — roughly 38% of mid-market sites still inherit wildcard blocks from pre-2024 robots.txt.
The 10-signal audit produces a 0–100 score per layer; remediation impact is sequenced cheapest-first: robots.txt (15 min) → schema sameAs (1 day) → Wikidata (2 hours) → author entities (1 week) → trust-seed reconciliation (2–4 weeks).

What is AI visibility?

AI visibility is the degree to which a brand, product, or person is accurately represented, consistently cited, and correctly described by AI systems — across generative search, conversational interfaces, and agentic workflows. The unit of measurement is not ranking position. It is representation fidelity: whether the AI gets your brand right, in the contexts where buyers are making decisions, before a click ever happens.

The distinction from SEO is structural, not semantic. Search engines optimize against documents: a URL, a title tag, a backlink profile, a query-to-page relevance score. AI systems optimize against entities: a recognized object in a knowledge graph, with properties, relationships, and a verifiable identity that persists across sources. A brand with excellent SEO but no entity presence is invisible to AI systems. A brand with mediocre SEO and a strong entity footprint will be cited, recommended, and described by language models even when no one searched for it directly. This is the shift the marketing industry is still processing — and the gap most teams have not budgeted to close.

The urgency is operational. Roughly 60% of informational queries now terminate inside AI interfaces without a click to any website, per the 2026 Bain Generative Search Benchmark. When a buyer asks Perplexity or ChatGPT “what is the best tool for X,” the AI does not return ten blue links — it returns a recommendation with context. If your brand is not in the recommendation, you did not lose a ranking. You were never considered.

What are the ten signals of AI visibility?

The ten signals split into three structural layers — technical trust infrastructure, knowledge graph presence, and content trust architecture — each addressing a different aspect of how AI systems evaluate whether an entity is real, verified, and recommendable. The Citable AI Visibility Checker audits all ten and returns a 0–100 score per layer with specific findings, not just a composite.

Nine of the ten signals carry meaningful weight in the composite score. The tenth — llms.txt presence — is included as a 4-point hedge signal: it does not move citation share today, but it is cheap to ship if your CMS can auto-generate it, and surfacing it in the audit answers the question every team eventually asks. Citable’s full position on llms.txt is in our honest-take post.

The layering matters because remediation sequencing matters. Technical trust failures (blocked crawlers, broken schema) are 15-minute fixes with immediate downstream impact. Knowledge graph gaps (no Wikidata Q-number, no Knowledge Panel) take a day to address and weeks to propagate. Content trust improvements (freshness cadence, factual density rebuilds) require ongoing editorial investment and produce the most durable compounding gains. Tackling them in reverse order — common when teams start with content — leaves the cheap structural wins on the table for months.

Across 312 client audits run in Q1 2026, brands scoring above 70/100 on the composite framework were cited 3.4× more often by ChatGPT, Perplexity, Claude, and Gemini than peers scoring below 40, holding domain authority constant. The mechanism is multiplicative: each layer’s signals reinforce the others, so the gap widens at the top of the curve.

Layer one: technical trust infrastructure

Technical trust infrastructure covers the machine-readable signals AI crawlers and knowledge systems read before any human encounters your content. Four signals sit in this layer: structured data and schema markup, AI crawler access, author entity verification, and llms.txt presence as a hedge signal. Together they determine whether AI systems can read, classify, and attribute your brand at all — the floor below which no content-layer optimization matters.

Signal 1 — Structured data and schema markup

Schema.org markup, implemented as JSON-LD in the page head, tells AI systems explicitly what type of entity you are, what you do, who runs you, and how you relate to other verified entities. It is the oldest signal in the framework and still the highest-leverage entry point: 78% of brands audited by Citable in Q1 2026 had some schema, but only 14% had complete, nested, cross-referenced schema that passed both Google’s Rich Results Test and Schema.org’s validator without warnings.

Strong implementation includes: Organization or Person type with complete name, url, logo, description, foundingDate, knowsAbout array for topic authority, and a sameAs array pointing to 10+ verified third-party profiles (LinkedIn, Crunchbase, Wikipedia, Wikidata, X, GitHub, industry registries). For service businesses, a nested Service or Product type with aggregateRating where defensible. For individuals, Person type with hasOccupation, alumniOf, and award properties adds meaningful entity depth.

Weak implementation looks like a single Organization block with name and url, no sameAs, no nested types, no connection to any verified external entity. This is technically valid schema and functionally near-useless: it tells the AI an entity exists without giving it any way to verify the claim. The Citable Checker validates presence, type completeness, sameAs density (≥3 passes, ≥10 is the working ceiling), and cross-reference integrity between your schema and the entities it claims to relate to.

Signal 2 — AI crawler access

AI systems cannot represent what they cannot read. Five major AI indexing systems use their own crawlers distinct from Googlebot — GPTBot (OpenAI), OAI-SearchBot (OpenAI search), ClaudeBot (Anthropic), PerplexityBot, and Google-Extended (Gemini and AI Overviews training) — and all respect robots.txt directives. Applebot-Extended is the sixth, less commonly audited but increasingly material.

Roughly 38% of mid-market sites audited by Citable in Q1 2026 inherited wildcard blocks from pre-2024 robots.txt configurations that pre-date the AI crawler user-agent strings. The effect is silent: the site is not flagged, the analytics do not break, but the AI system’s representation of the brand goes stale because no fresh signal arrives. Over a 12-month window, the brand drifts toward whatever the LLM’s training-cutoff snapshot contained — often inaccurate, often incomplete, sometimes derived from sources the brand would not endorse.

The fix is a 15-minute robots.txt audit and explicit allow directives for each AI crawler. The Citable Checker reads your robots.txt, evaluates whether known AI crawlers are explicitly allowed, explicitly blocked, or operating under ambiguous wildcard rules, and flags any configuration that would prevent freshness updates.

Signal 3 — Author entity verification

AI language models maintain strong internal representations of people who are verifiably real: researchers with Google Scholar profiles, journalists with bylines across multiple publications, professionals with Wikidata entries, speakers with consistent third-party citations. When content is authored by an entity the AI can verify, that content inherits a fraction of the author’s trust signal. When content is anonymous or attributed to a name that exists in no knowledge system, the AI treats it as lower-confidence information regardless of on-page quality.

Author entity verification means each contributor to your brand’s high-value content has a machine-readable presence: a structured author page with Person schema, a Wikidata Q-number, consistent name and credential representation across LinkedIn, ORCID (for research-adjacent content), Crunchbase, and domain-relevant directories. The sameAs array on the Person schema is the connective tissue — without it, the AI cannot link the byline on your site to the verified profile elsewhere.

For brands in professional services, consulting, agencies, and B2B SaaS, the author entity signal often outweighs page-level signals. Citable internal benchmarks across 47 agency-sector audits in Q1 2026 found that content authored by entities with complete Person schema and a Wikidata Q-number was cited 2.1× more often than otherwise comparable content authored by anonymous or unverified bylines.

Signal 4 — llms.txt presence (hedge signal)

llms.txt is a proposed convention for publishing a machine-readable manifest at the root of your domain summarizing your most-citable pages. It sits in the technical trust layer because the file is infrastructure-level, not editorial — but it carries the lowest weight in the framework (4 of 100 points) and is included primarily so the audit can answer the question every team asks.

The honest position: as of 2026, no major LLM vendor documents consuming external llms.txt files. Google explicitly debunked llms.txt as a ranking factor in May 2026. ChatGPT, Perplexity, Gemini, Claude, and Google AI Overviews all retrieve content through their existing crawlers (GPTBot, PerplexityBot, etc.) and do not request /llms.txt as part of their retrieval flow. The signals that actually move Share of Answer are the other nine — schema, entity, extractability, crawler access, and third-party citations.

The pragmatic recommendation: if your CMS can auto-generate /llms.txt from your sitemap and meta descriptions, ship it as a cheap hedge against the possibility that future LLM versions begin consuming it. Do not author it manually, do not pay a vendor to manage it, and do not invest editorial time in maintaining it. The Citable Checker flags presence but warns explicitly against over-investment. The full position is in our llms.txt honest-take post.

Layer two: knowledge graph presence

Knowledge graph presence moves from your own infrastructure to the external knowledge systems AI models consult to verify claims about your entity. Three signals sit here: Wikipedia presence or citation, Wikidata entry and properties, and Google Knowledge Graph recognition. This layer answers the AI system’s verification question — “is what this entity says about itself consistent with what the rest of the web’s structured knowledge says about it?”

Signal 5 — Wikipedia presence or citation

Wikipedia occupies a unique position in AI training and inference: it is not simply a high-authority website, it is a primary source for factual grounding that most major language models were trained on extensively and continue to use as a reference anchor at inference time. A Wikipedia article about your brand or key executives is a strong direct signal, but it is not the only path.

Indirect paths matter measurably: being cited as a reference in topically relevant Wikipedia articles, being named in the “see also” section of an adjacent article, or having your research cited in footnotes all contribute to the Wikipedia trust signal. For most brands, a dedicated article is genuinely out of reach — Wikipedia’s notability criteria are strict and legitimately enforced, with roughly 80% of company drafts deleted before publication per the 2025 Wikimedia Foundation transparency report. The pragmatic alternative is strategic presence within existing articles: contributing citations in your field, ensuring factual claims about your industry that you are qualified to substantiate are referenced from primary sources you control.

The Citable Checker evaluates Wikipedia presence across three dimensions: direct article existence (binary), citation frequency in related articles (count, weighted by article authority), and structured link patterns from Wikipedia to your domain (presence of rel="cite_note" and inclusion in External links sections).

Signal 6 — Wikidata entry and properties

Wikidata is the structured-data backbone of Wikipedia and one of the primary knowledge graphs AI systems use to resolve entity identity. It is also dramatically more accessible than Wikipedia: there is no notability requirement, no editorial-review gate, and creation takes under two hours for a well-prepared entity. A Wikidata Q-number is effectively a persistent, machine-readable ID for your entity in the global knowledge graph — the equivalent of a passport number in the knowledge-resolution layer.

A complete Wikidata entry includes accurate properties for: instance of (organization, business, agency, person), country, industry, official website, inception (founded date), headquarters location, founded by, chief executive officer, and identifier properties linking to Crunchbase, LinkedIn, X, and any sector-specific registries. When these properties are populated, AI systems can resolve your entity with high confidence, connect it to related entities, and represent it accurately without inferring from scraped content.

The absence of a Wikidata entry does not make a brand invisible, but its presence measurably improves entity resolution confidence — which directly affects how consistently AI systems describe you across queries. For any brand serious about AI visibility, creating and maintaining a Wikidata entry is the single highest-leverage action available for the cost (≈2 hours of qualified time). The Citable Checker queries Wikidata’s API for an entry matching your domain, evaluates property completeness against the 12 fields most relevant for AI resolution, and scores the entry on a 0–100 completeness scale.

Signal 7 — Google Knowledge Graph recognition

Google’s Knowledge Graph is distinct from Wikidata but feeds from it, from Wikipedia, and from structured data on your own site. When Google maintains a Knowledge Panel for your brand, it is a reliable proxy signal that the underlying entity has been resolved with sufficient confidence across multiple sources. A Knowledge Panel is not the goal — it is the visible evidence that the underlying entity infrastructure is working as a system.

What matters operationally is whether the factual properties Google attributes to your entity are accurate and consistent with what you publish in your schema and your Wikidata entry. Inconsistencies between these three sources create low-confidence signals that AI systems penalize: a different founding year on your schema versus Wikidata versus the Knowledge Panel will cause LLMs to either hedge (“possibly founded in 2019 or 2020”) or omit the fact entirely from generated responses. Consistency is the signal; the specific values matter less than their alignment.

The Citable Checker queries the Google Knowledge Graph Search API for your domain and entity name, returns the confidence score Google assigns, and flags property discrepancies across your on-site schema, Wikidata entry, and Knowledge Graph record. Discrepancies are ranked by AI-impact weight: founding year, headquarters, and key-people fields are weighted highest because they are the most frequently surfaced in generative responses.

Layer three: content trust architecture

Content trust architecture addresses your own content and how AI systems evaluate its reliability as a source worth citing. Three signals sit here: content freshness, factual density, and trust-seed profiles. This is the layer where editorial discipline meets entity infrastructure — where the work of producing content intersects with the work of being cited.

Signal 8 — Content freshness

AI systems have training cutoffs, but the systems that matter most for brand visibility — Perplexity, ChatGPT with search, Gemini, Claude with web access, and Google AI Overviews — also retrieve and read live content at inference time. Freshness signals matter in two distinct ways: content that has not been updated in 18+ months accumulates staleness markers (prices change, statistics age, methodologies evolve), and freshness itself signals operational confidence — a brand actively maintaining its authoritative content is implicitly signaling that it is actively operating.

The operational threshold is tighter than most teams assume. Pages older than 30 days lose up to 40% of citation potential when not refreshed, per Citable’s Q1 2026 cross-engine measurement; pages older than 12 months drop another 35%. The compounding effect is steep. Freshness optimization does not mean publishing constantly — it means auditing high-value pages on a 60-day cycle, updating statistics and examples, and ensuring dateModified schema properties are set, accurate, and aligned with the actual content delta (cosmetic edits do not count and are increasingly detected as freshness theater).

The Citable Checker evaluates average content age across indexed pages, dateModified schema implementation completeness, and the ratio of substantive updates to net-new publication over a rolling 90-day window. Brands serious about freshness can productize the cadence with the Citation Freshness Loop SKU — a 60-day refresh retainer on the 10 highest-citation pages.

Signal 9 — Factual density

AI systems preferentially cite content with high factual density — specific, verifiable claims with precise figures, named sources, defined methodologies, and attributed statements. They weight this content more heavily as a citation source because it offers the kind of specific, checkable information that makes a defensible citation inside a generative response. This is the editorial signal most counterintuitive for teams trained in the SEO content era.

The SEO-era conventions of write accessibly, keep sentences short, use analogies rather than data, and avoid jargon produce content that reads well and cites poorly. AI visibility inverts the calibration: pillar content and key service pages should carry specific statistics with named sources, defined frameworks with named components, methodology descriptions a reader can follow, and verifiable claims a fact-checker (human or AI) could validate. Citable’s internal benchmark — published in detail in our data-density threshold post — documents a sharp breakpoint at 19 discrete data points per blog post: articles meeting the threshold averaged 5.4 AI citations, articles below it averaged 2.8, a 93% lift.

The Citable Checker performs a factual density audit on your key pages, scoring the ratio of verifiable specific claims to general statements. The working target is one data point per 75 words of body copy, with each data point composed of a specific number, a unit, and a named source.

Signal 10 — Trust-seed profiles

Trust-seed profiles address a structural problem that no amount of on-site optimization can solve: AI systems build their entity representation from the aggregate of what they read across the web, not from your site alone. A trust-seed profile is a third-party, human-readable, machine-crawlable presence that establishes your brand’s existence, activity, and authority in a context external to your own domain.

The highest-weight trust-seed sources for AI representation are: structured directory listings (Crunchbase, LinkedIn Company Pages, G2, Capterra for SaaS, Clutch for services), press mentions with named attribution in publications the LLM’s training corpus included (Reuters, Bloomberg, TechCrunch, Wired, sector-specific trade press), podcast and conference appearances with transcripts or structured show notes, and academic or industry citations where applicable. The key property is that the AI system would consult these sources even if it never visited your website.

Consistency across trust-seed sources reinforces entity resolution; inconsistency damages it. When your Crunchbase says “founded 2019” and your LinkedIn says “founded 2020” and your schema says “founded 2018,” AI systems either hedge (“founded around 2019”) or omit the claim. When all sources align and use the same brand description, the AI assembles the description verbatim into responses. The Citable Checker maps your trust-seed footprint across 14 primary business directories, media mention databases, and structured citation sources, scoring both breadth of presence (how many sources) and consistency of representation (alignment across sources).

How to run the Citable AI Visibility Checker

A full 10-signal audit on the Citable AI Visibility Checker takes 30 minutes and returns specific findings per signal, not a single composite score. Run the audit against your primary canonical domain — the one where your brand presence lives. If you operate country-level subdomains or a separate blog domain, run each independently; they accumulate independent entity footprints in AI systems and need to be audited separately.

When reviewing results, prioritize by layer. Technical trust infrastructure failures (blocked AI crawlers, broken or incomplete schema) are 15-minute to 1-day fixes with immediate downstream impact on every other signal. Knowledge graph gaps take longer to address and propagate (a Wikidata entry takes 2 hours to create and 2–6 weeks to propagate into downstream consumers), but the compounding effect across all major LLMs is significant. Content trust architecture improvements require ongoing editorial investment but produce the most durable gains because they continuously refresh the signal rather than setting it once.

Run the Citable AI Visibility Checker →

What to fix first: the entity optimization stack

Sequenced for fastest measurable impact on AI representation fidelity, cheapest first:

Fix robots.txt — Explicitly allow GPTBot, OAI-SearchBot, ClaudeBot, PerplexityBot, Google-Extended, and Applebot-Extended. 15 minutes. Unblocks freshness signals immediately.
Rebuild schema markup — Validate with Google’s Rich Results Test and Schema.org’s validator in parallel. Target ≥3 entries in the sameAs array as the working pass threshold, ≥10 as the ceiling. 1 day of dev time. Highest-leverage on-site investment.
Create or verify your Wikidata entry — Target ≥12 populated properties. 2 hours of qualified time. Permanent infrastructure.
Establish author entity profiles — Person schema on author pages, Wikidata Q-number and LinkedIn for anyone whose byline appears on your high-value content. 1 week per primary author.
Reconcile trust-seed profiles — Run brand name, description, and key claims against Crunchbase, LinkedIn, G2, Capterra (where applicable), and primary external profiles. Reconcile inconsistencies, do not just document them. 2–4 weeks.
Content freshness review — Audit your 10 most-trafficked pages on a 60-day cycle. Update statistics, refresh examples, set dateModified correctly in schema with substantive deltas. Ongoing.
Factual density rebuild — On each pillar page, identify the 3–5 specific verifiable claims that make your position defensible and citeable. Make them explicit. Apply the 19-point editorial floor to all new content. Ongoing.
llms.txt (hedge, optional) — If your CMS can auto-generate it from your sitemap, ship it. If it requires manual authoring, skip it. Minutes if automated; do not invest more.

What AI visibility is not

Three practices marketed as AI visibility optimization are either ineffective or actively harmful to your trust signal. Naming them is the cheapest spend protection a team can have.

Keyword stuffing with conversational query phrases does not improve AI visibility. AI systems do not keyword-match the way search engines do — they evaluate entity confidence and factual reliability. Pages stuffed with “what is the best X” phrasings read as low-quality to both LLM training pipelines and live retrieval systems, and increasingly trigger downweighting filters in the major engines.

Publishing hundreds of thin AI-generated pages to increase crawl frequency does not build entity trust. It dilutes factual density and triggers low-quality content signals in both traditional search and AI training pipelines. Google’s March 2024 helpful content update and equivalent filters in Perplexity and ChatGPT’s web retrieval explicitly penalize this pattern.

Buying links from irrelevant domains to inflate authority scores has no measurable effect on AI representation. The trust signals AI systems use are structurally different from PageRank-style link authority; the relevant signals are entity-graph density and citation consistency, not raw backlink volume. Genuine AI visibility work is slower, more structural, and more durable than any of these shortcuts. It requires treating your brand as an entity to be verified, not a document to be ranked.

What comes next in this series

This post is the foundation layer of the Citable AI visibility series. Subsequent posts go deeper on each signal: a step-by-step Wikidata property guide for brands, the full author entity stack from zero, the AI crawler robots.txt audit guide with the six crawler directives, factual density auditing as a repeatable editorial methodology, and the trust-seed source ranking with weights per AI engine.

If you want the fastest next step, run the Citable AI Visibility Checker on your domain and read the specific findings. The framework above tells you what matters. The audit tells you where you stand.

Frequently asked

Questions buyers ask before booking

What is the difference between AI visibility and SEO?

SEO ranks documents against queries. AI visibility evaluates entities against trust thresholds. A search engine asks 'is this URL relevant and authoritative for this query?' An AI system asks 'is this entity real, verified, and trustworthy enough to stake a recommendation on?' The signals overlap (both reward authority), but the targets differ: SEO targets a URL, AI visibility targets a Q-number. Brands with strong SEO and weak entity infrastructure can rank in Google and still be invisible inside ChatGPT, Perplexity, Claude, and Gemini.

Why are there ten signals when only nine carry weight?

Nine signals are scored against substantive weights (schema, AI crawler access, author entity, Wikipedia, Wikidata, Google Knowledge Graph, content freshness, factual density, trust-seed profiles). The tenth — llms.txt presence — is included as a 4-point hedge: it is not consumed by ChatGPT, Perplexity, Gemini, Claude, or Google AI Overviews as of 2026 (Google explicitly debunked it as a ranking factor in May 2026), but it is cheap to auto-generate and worth surfacing in the audit so the question 'should we ship one?' has an evidence-based answer. Citable's full position is in our [llms.txt honest-take post](/journal/does-llms-txt-matter-2026/).

Which signal matters most for AI visibility?

It depends on the starting point. For brands with no entity footprint at all, the highest-leverage move is a complete Organization or Person schema with a dense sameAs array (10+ verified third-party profiles) — this single change improves entity resolution confidence measurably within the next crawl cycle. For brands with schema already in place, the next-highest leverage is a Wikidata entry, then author entity verification on bylines, then trust-seed profile reconciliation. Content-layer signals (freshness, factual density) compound but take longer to register.

Do AI crawlers really respect robots.txt?

Yes, with documented compliance. GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot, OAI-SearchBot, Applebot-Extended, and Google-Extended all honor robots.txt directives published since their respective launch dates (2023–2024). A blocked crawler does not crawl, which means the AI system either uses stale training-cutoff data about the brand or has no data at all. Roughly 38% of mid-market sites audited by Citable in Q1 2026 still inherit wildcard blocks from pre-2024 configurations that pre-date the AI crawler user-agent strings.

Is a Wikipedia article required for AI visibility?

No. Wikipedia is a strong signal but not a gate. Wikidata — Wikipedia's structured-data layer — has no notability requirement and is accessible to any real entity. A complete Wikidata entry with 10+ properties (instance of, country, industry, official website, founded date, key people) delivers most of the entity-resolution benefit that a Wikipedia article would, without the editorial-review friction. The alternative path through Wikipedia is citation-based: being referenced as a source in topically adjacent articles, rather than being the subject of a dedicated article.

How long does an AI visibility audit take and what does remediation cost?

A full 10-signal audit using the Citable AI Visibility Checker takes 30 minutes and produces specific findings per signal, not just a composite score. Remediation sequencing is cheapest-first: robots.txt fixes (15 minutes), schema sameAs density (1 day of dev time), Wikidata entry creation (2 hours), author entity profiles (1 week per author), trust-seed profile reconciliation across Crunchbase / LinkedIn / G2 / Capterra (2–4 weeks). The typical Citable engagement to move a brand from sub-40 to above-70 on the composite score runs 3–6 months, depending on starting point and the complexity of the trust-seed footprint.