AI content poisoning: the threat model for your brand in AI search

Attackers can corrupt what ChatGPT, Perplexity, and Gemini say about your brand through data poisoning and RAG poisoning. Here's the threat model and the defensive playbook.

Elizabeth S.

Founder 8 min read

Share
Summarize with AI
In this article
  1. 01 What is AI content poisoning, and why is it a security problem?
  2. 02 How little does it actually take to poison a model?
  3. 03 Which surfaces is your brand exposed on?
  4. 04 What does the defensive playbook look like?
  5. 05 What’s the difference between defending and reacting?

Attackers can change what AI models say about your brand, and the research now shows it takes far less effort than most teams assume. Data poisoning, prompt injection, and retrieval poisoning are no longer theoretical: peer-reviewed and lab-confirmed studies have measured the cost, and it is low. This is not the reactive question of cleaning up a wrong answer after the fact — that is covered in our piece on fixing incorrect AI brand mentions. This is the security threat model: how brand reputation in AI search gets corrupted on purpose, and the proactive defense that raises the cost of doing it.

Read it as a threat model. Identify the surfaces, understand the attacks, then build the backbone.

What is AI content poisoning, and why is it a security problem?

AI content poisoning is the deliberate corruption of the data a language model trains on or retrieves from, so the model produces attacker-chosen output. When the target is a brand, the output is what ChatGPT, Perplexity, Gemini, or Claude tells a buyer about your company, your products, your security posture, or your leadership.

This belongs in the security column, not the marketing column. The OWASP Gen AI Security Project’s Top 10 for LLM Applications 2025 lists Prompt Injection as LLM01 — the single highest-ranked risk — and Data and Model Poisoning as LLM04, with a dedicated Misinformation category alongside. These are recognized application vulnerabilities with the same standing as injection flaws and broken access control. The difference is the blast radius: the corrupted output lands on the answer surface your buyers now trust more than your own homepage.

Three attack families matter for brands.

Training poisoning. The attacker plants content where models scrape training data — open web pages, forums, repositories, archives — so the false claim is baked into the model’s weights. This is the hardest to reverse because it does not depend on a live source you can take down.

RAG poisoning. Modern AI search leans on retrieval-augmented generation: the model pulls live documents at query time and synthesizes an answer. The attacker plants content the retriever will surface for a target question, hijacking the answer without ever touching the model’s weights.

LLM grooming. This is mass-publishing of false content at scale — hundreds or thousands of pages repeating the same fabricated claim — so that whichever path the model takes, training or retrieval, the lie is the most-repeated signal. It exploits the same weakness that makes consistent, authoritative information work in your favor: models weight repetition and apparent consensus.

How little does it actually take to poison a model?

Less than the size of the training set would suggest. The headline finding came in 2025 from Anthropic, the UK AI Security Institute, and the Alan Turing Institute: roughly 250 malicious documents were enough to implant a backdoor in a large language model, and the number stayed near-constant across model sizes from 600 million to 13 billion parameters. The arXiv paper puts the scale in perspective — 250 documents is about 0.00016% of a 13-billion-parameter model’s training data. The Turing Institute’s own write-up concludes plainly that LLMs are more vulnerable to data poisoning than previously thought.

The intuition most people carry — that you would need to corrupt some meaningful percentage of a model’s diet to move it — is wrong. The cost is a roughly fixed, small number of documents, which means the attack does not get harder as models get bigger.

Retrieval is even cheaper to attack. In the PoisonedRAG study presented at USENIX Security 2025, injecting around five malicious texts per target question into a knowledge base of millions of documents reached roughly a 90% attack success rate. Five passages, against millions, to reliably flip a specific answer. RAG is the layer most AI search products use to stay current, which makes it the layer most exposed to targeted brand attacks.

The harm is not limited to a single wrong fact, either. A medical-domain study published in Nature Medicine (NYU Langone Health) found that replacing just 0.001% of training tokens with misinformation produced 7 to 11% more harmful completions — a measurable, compounding degradation from a vanishingly small input. For a brand, the equivalent is a steady drift toward an attacker’s framing every time the model is asked about you.

Which surfaces is your brand exposed on?

Your brand’s exposure follows the model’s data path. Each surface an attacker can write to is a surface they can poison. Hardening the publish surfaces you control — the schema, llms.txt, and agents.json AI reads about you — closes the easiest write paths before an attacker finds them.

SurfaceAttack familyWhy it’s exposed
Open web pages about your brandTraining + LLM groomingScraped into training corpora; mass-published lookalikes outweigh sparse truth
Wikis, forums, Q&A, reposTrainingHigh-trust sources models learn from; editable by anyone
Live retrieval index (RAG)RAG poisoning~5 planted passages can flip a target answer
Pages your own AI agents readPrompt injectionHidden instructions hijack an agent’s summary of you
Stale or unreachable owned contentIndirectA weak authoritative source loses to a strong planted one

The last row is the one teams control directly and ignore most often. If your own authoritative content is thin, stale, or unreachable to AI crawlers, you have left the field open: the model has no strong, consistent version of your brand to anchor on, so a planted version wins by default. We covered the structural fix for this in what is a Context Hub and the disambiguation fix in entity disambiguation in AI search.

What does the defensive playbook look like?

Defense against poisoning is proactive and structural, not reactive cleanup. You cannot patch a model you do not own, and you cannot out-publish a determined attacker on volume alone. What you can do is make the authoritative version of your brand the cheapest, most consistent, most trusted signal available — so poisoning has to overcome a strong incumbent rather than fill a vacuum. The entity grounding that makes visibility durable across model updates is the same asset that makes poisoning expensive to attempt. Four layers.

Build the entity backbone

Give the model one unambiguous version of your brand across every source it trusts: a clean Wikidata entry, consistent sameAs links across owned profiles, structured data that agrees with itself, and a Knowledge Panel that resolves the entity. When the entity is well-defined and consistent, a planted contradiction stands out as the outlier instead of filling an information gap. This is the same backbone that powers schema markup for AI — it does double duty as both a visibility asset and a poisoning defense.

Own the authoritative surfaces

The more of your brand’s canonical information lives on surfaces you control and keep current, the less an attacker can supply. An owned, maintained, machine-readable source — your site, your structured data, a published agents.json manifest — is a source the model can prefer. Thin owned content is an invitation.

Keep the infrastructure reachable and intact

This is where poisoning defense meets engineering. Three failure modes silently undermine everything above: AI crawlers like GPTBot, ClaudeBot, and PerplexityBot blocked or throttled at the edge so they never reach your authoritative content; CI/CD pipelines that overwrite schema markup, canonicals, and llms.txt on deploy and quietly erase your progress; and render paths that require JavaScript the crawlers cannot execute, leaving your strongest signals invisible. A site that cannot be crawled, or that resets its own authoritative markup on every deploy, is poisoning-vulnerable by construction — not because anyone attacked it, but because it left the door open.

This is the work behind Citable’s AI-Ready Infrastructure. The Infrastructure Security Audit tier is the one that maps directly to the threat model in this piece: hardening, access and secrets management, and a full vulnerability assessment with remediation, plus an optional monitoring retainer. It treats your brand’s presence in AI search as an attack surface and secures it like one.

Monitor continuously

Monitoring is the control that turns a slow-bleeding poisoning attack into a contained incident. Re-run a fixed set of brand prompts across ChatGPT, Perplexity, Gemini, and Claude on a regular cadence, and diff the answers over time. A poisoned answer is detected in days instead of surfacing when a prospect quotes it back to you in a sales call. The monitoring retainer attached to the Infrastructure Security Audit exists for exactly this: catching the drift early, while the correction window is still short.

What’s the difference between defending and reacting?

Reacting is what you do after a wrong answer appears; defending is what makes that answer expensive to plant and fast to overwrite. The two share a toolkit — entity backbone, authoritative sources, schema — but they sit at opposite ends of the timeline and cost curve.

Defense (proactive)Reaction (after the fact)
TriggerBuilt before any attackA wrong answer is already live
CostLower, one-time backbone workHigher, repeated correction cycles
Time to safeContinuous; drift caught in days30–90 days to overwrite an entrenched claim
Attacker’s jobOvercome a strong incumbent signalFill an information vacuum
Primary controlEntity backbone + monitoringAuthoritative third-party correction

The asymmetry is the whole argument. A vacuum is cheap to poison; a strong, consistent, monitored entity backbone is expensive to poison and quick to repair. The research is unambiguous that the attack side is cheap — 250 documents, five retrieval passages, a fraction of a percent of training tokens. The only variable you control is how strong the incumbent signal is when the attack arrives.

If your brand has no deliberate AI presence yet, start with the foundations on pricing and the structural pieces — context hubs, schema, entity disambiguation — that make poisoning hard. If your brand is already visible in AI search and you want it secured against deliberate corruption, the Infrastructure Security Audit is where the threat model in this piece becomes a remediation plan. Either way, the move is the same: stop treating your AI presence as a marketing output and start treating it as an attack surface, because that is exactly what the research has shown it to be.

How little it takes to poison what a model says

250

Malicious documents to backdoor an LLM

Near-constant from 600M to 13B parameters — about 0.00016% of a 13B model's training data.

~5

Injected texts to hijack a RAG answer

PoisonedRAG reached ~90% attack success per target question in a multi-million-document store.

0.001%

Training tokens swapped for misinformation

Produced 7–11% more harmful completions in a medical-domain study (Nature Medicine).

Frequently asked

Questions buyers ask before booking

What is AI content poisoning?

AI content poisoning is the deliberate planting of false or malicious content into the data that language models train on or retrieve from, so the model produces attacker-chosen output. For brands, that means corrupting what AI assistants say about your company, products, security, or leadership.

How many malicious documents does it take to poison a model?

Research from Anthropic with the UK AI Security Institute and the Alan Turing Institute found that roughly 250 malicious documents can implant a backdoor in a large language model regardless of model size, tested from 600M to 13B parameters. That is a near-constant number, not a percentage that scales with the training set.

Is RAG poisoning different from training poisoning?

Yes. Training poisoning corrupts the model during training and is hard to reverse. RAG poisoning targets the live retrieval layer: the PoisonedRAG study showed that injecting roughly five malicious texts per target question into a multi-million-document knowledge base reached about a 90% attack success rate. RAG poisoning is faster to exploit and faster to defend against.

Can prompt injection change what AI says about my brand?

Indirectly, yes. Prompt injection is OWASP's number one LLM risk (LLM01). Hidden instructions embedded in a page an AI agent reads can manipulate its summary of your brand. The defense is the same backbone that resists poisoning: authoritative, consistent, controlled sources the model trusts more than planted ones.

How do I defend my brand against AI poisoning?

Build a strong entity backbone so the model has one consistent, authoritative version of your brand, own the authoritative surfaces it retrieves from, keep your infrastructure reachable by AI crawlers, and monitor brand answers continuously so corruption is caught in days. Citable's Infrastructure Security Audit covers the hardening and monitoring side of that defense.

Ready to be cited by AI?

Two paths in. Free check tells you where you stand in 10 seconds. Paid audit tells you exactly what to fix, with a baseline you can measure forward from.

Run the free check Book the audit · €1,200

Prefer to talk first? Get in touch