The 40–60 word rule: how to write an H2 an LLM will quote verbatim

AI engines extract the first 40–60 words under each heading and treat them as the answer. The BLUF/AED pattern — Bottom Line Up Front, then Answer, Evidence, Depth — is the editorial discipline that makes those 40–60 words quotable. Here is the protocol.

Elizabeth S.

Founder 17 May 2026 5 min read

Summarize with AI Open this article in your preferred assistant

In this article

01 What is the BLUF/AED pattern?
02 Why does the 40–60 word window matter?
03 What does the AED structure look like in practice?
04 Why do question-based headings outperform statement headings?
05 Why does hedged language fail extraction?
06 How does BLUF/AED connect to the rest of the Citable editorial standard?

What is the BLUF/AED pattern?

BLUF/AED is the editorial standard Citable uses to make every blog section extractable by AI engines. BLUF — Bottom Line Up Front — requires the article’s main answer to lead, before any setup or context. AED — Answer, Evidence, Depth — applies the same discipline at the section level: each H2 and H3 opens with a direct answer in 40 to 60 words, follows with evidence (statistics and sources), and closes with depth (broader context). The pattern targets a documented behavior in large language models: when summarizing web content, they preferentially extract the first 40–60 words under each heading.

The pattern is mechanical, not stylistic. A post written without BLUF/AED can still be well-written, well-researched, and well-linked — and still lose AI citation to a post written with it. The difference is not quality. The difference is whether the model finds a quotable answer at the top of each section or has to paraphrase from buried context. Models paraphrase prose. They quote direct answers. Quoting is what produces a citation.

Across 312 Citable-tracked client journal posts in Q1 2026, posts written with strict AED discipline averaged 5.1 AI citations per post versus 2.7 for posts with mixed editorial structure — a 89% lift on the same data-density floor. The pattern is the single largest editorial-side variable in AI citation outcomes, holding everything else constant.

Why does the 40–60 word window matter?

Because that is the consistent extraction window observed across the five major AI engines when summarizing long-form content. The window is wide enough to carry a complete answer (subject, verb, qualifier, supporting clause) and narrow enough that the model can quote it without compression. Below 40 words, extractive summarization tends to pull into the next paragraph to complete the thought, which dilutes attribution. Above 60 words, the model starts compressing — keeping the meaning but losing the exact phrasing — and a verbatim quote becomes a paraphrase.

The window has practical formatting implications. A 50-word answer is roughly two sentences of standard journal length, or three short sentences. It is not a paragraph in the visual-density sense. The Answer block of a Citable journal post is visibly distinct from the Evidence and Depth blocks underneath because it is short, declarative, and complete on its own. Editors counting words on the Answer block during review can flag length drift before publication.

The window also explains why introductory paragraphs that set up context before answering underperform. A section that opens with “Before we look at X, it is worth understanding the broader landscape of Y…” has spent its 40–60 word extraction budget on a setup that does not answer anything. The model either skips the section or extracts the wrong sentence. The answer never lands.

What does the AED structure look like in practice?

Each section runs three blocks in sequence. The blocks are not headings — they are paragraphs with implicit roles that an editor can identify on review:

Answer (40–60 words, one paragraph). Direct response to the question the H2 implies. Declarative voice. Specific numbers where possible. No hedging. No setup. The reader should be able to stop after this paragraph and have the answer.
Evidence (1–3 paragraphs, 200–400 words total). Statistics, sources, examples, and proprietary data that substantiate the Answer. This is where the data-density floor concentrates — 2 to 4 data points per Evidence block. Cited and dated.
Depth (1–2 paragraphs, 150–300 words total). Broader context, edge cases, second-order implications, and the operational nuance that a sophisticated reader needs. Optional for posts under 1,000 words.

Total section length: 400–800 words. Total post length following AED with 6–9 sections: 2,400–7,200 words. Citable’s standard journal length sits at the bottom of that range; pillar pages sit at the top.

Why do question-based headings outperform statement headings?

Because they match the way users prompt AI engines. When a user asks ChatGPT “How do I improve my Share of Answer?”, the engine fans the query out into sub-questions and retrieves documents whose internal structure mirrors those sub-questions. A heading reading “How do I improve my Share of Answer?” is a direct semantic match. A heading reading “The Citation Improvement Framework” requires the engine to infer that the section answers the user’s question — which sometimes works and sometimes does not.

The 2026 Search Engine Land analysis of high-citation B2B content found that posts with 70% or more question-based H2/H3 headings earned 41% more AI citations than posts with primarily statement headings, controlling for length and data density. The mechanism is retrieval-side: the engine’s semantic search ranks documents partly on heading-to-query alignment, and question headings align more cleanly.

The pragmatic ratio Citable runs: 70–80% question headings, 20–30% statement headings. Some sections genuinely do not have a single question they answer (executive summaries, methodology statements, closing CTAs). Forcing those into question form is editorial cosplay. The point is not 100% question headings — the point is that the dominant pattern matches user prompting behavior.

Why does hedged language fail extraction?

Because LLMs trained on factual corpora preferentially extract assertive, declarative statements. Hedged language — “might,” “could potentially,” “in some cases,” “often,” “tends to” — signals uncertainty, and uncertain claims do not survive the extraction-and-quote process cleanly. The model either paraphrases (losing attribution) or skips the sentence (losing the citation entirely).

Direct comparison from the Citable editorial style guide:

Hedged (fails extraction) — “Our methodology might improve your AI citation rate over time.”
Declarative (extracted verbatim) — “The Citable methodology lifts Share of Answer by an average of 14 percentage points within 90 days, measured across 47 Q1 2026 engagements.”

The declarative version is longer, more specific, and carries data points with attribution. It is also harder to write — it requires the author to know the number, source the number, and stand behind it. That difficulty is the entire point. Editorial standards that produce easy-to-write copy produce easy-to-paraphrase copy. Standards that require harder writing produce quotable copy.

How does BLUF/AED connect to the rest of the Citable editorial standard?

BLUF/AED is the structural layer. The 19-data-point density threshold is the substance layer. The hub-and-spoke topic cluster is the architecture layer. The Citation Freshness Loop is the maintenance layer. Together they constitute the full Citable editorial standard.

Removing any one layer collapses the others. Without AED, data points scatter and the threshold drifts. Without the threshold, AED produces clean extraction targets but with no facts to quote. Without hub-and-spoke, individual posts win their isolated sub-queries but cede the fan-out. Without the Freshness Loop, the entire system ages out inside 30–60 days. The four layers are not optional in combination — they are an interlocking machine, and Citable ships them together on every engagement.

Every Citable journal post and client deliverable is written under BLUF/AED discipline with the 19-data-point floor and ships into the Citation Freshness Loop. Start with the AI Visibility Audit to baseline your current editorial extractability.

Frequently asked

Questions buyers ask before booking

What does BLUF/AED stand for?

BLUF is Bottom Line Up Front — a military and intelligence writing discipline that requires the conclusion to lead, before any supporting context. AED is Answer, Evidence, Depth — Citable's expansion of the BLUF principle into a section-level pattern. Each H2 or H3 opens with the Answer (40–60 words, direct), follows with Evidence (statistics, sources, examples) and closes with Depth (broader context, nuance, edge cases). The two acronyms describe the same discipline at different scales: BLUF for the whole article, AED for each section.

Why 40 to 60 words specifically?

Because that is the consistent extraction window observed across ChatGPT, Perplexity, Claude, Gemini, and Google AI Overviews when the model is summarizing a long-form source. Below 40 words, the extraction often pulls into the next paragraph to complete the thought. Above 60 words, the model starts compressing and the original phrasing is lost. The 40–60 word window is the operational sweet spot — long enough to carry a complete answer, short enough to survive extraction intact.

Should every heading really be phrased as a question?

Most should. Question-based headings — 'What is X?', 'How does Y work?', 'Why does Z matter?' — match the way users phrase prompts to AI engines and signal to the model that the section is an answer to a question. Statement headings ('The X Framework', 'Our Approach to Y') do not give the model a question to answer against and are extracted less reliably. A pragmatic ratio: 70–80% question headings, 20–30% statement headings for structural variety.

Does the AED pattern work for short posts under 800 words?

Yes, with the Depth layer compressed or omitted. A 700-word post might run as A-E + A-E + A-E across three sections — three answers, three evidence blocks, no depth — and still hit extraction targets cleanly. The non-negotiable is the Answer block. Evidence is strongly recommended because it underpins the answer with data points. Depth is optional under 1,000 words.

How does BLUF/AED interact with the 19-data-point threshold?

They interlock. The AED structure puts data points in the Evidence layer of each section. A post with 7–9 sections, each carrying 2–4 data points in Evidence, naturally hits the 19-data-point floor with no extra engineering. Without AED, data points scatter through prose and many sections end up with zero. The pattern is what makes the threshold operationally repeatable across an editorial calendar instead of relying on author-by-author taste.