Original research·AI search·~14 min read
What actually gets cited by AI answer engines
I joined roughly 9,500 Bing Copilot citations back to the source code of three sites I own. One finding survived every check: demand decides. Page craft is the cover charge. Demand is the lever.
Across roughly 9,500 Bing Copilot citations on three sites I own, one factor predicted whether a page got cited: whether real people search the thing the page answers. In a controlled test of 751 pages built from one identical template, search demand carried an odds ratio of 4.42 (p=0.003), while raw word count was statistically meaningless (odds ratio 1.09, p=0.64). Schema markup, server-side rendering, and question-shaped headings were not required. My single most-cited page, with 5,179 citations, has none of them. Page craft is a floor you have to clear. Demand is the lever that actually moves citations.
01I set out to answer a dumber question
I didn't set out to prove that demand decides. I set out to answer something more practical: of all the pages I've shipped, which ones do AI answer engines actually cite, and what do those cited pages have in common that the ignored ones don't?
I had the receipts: Bing Webmaster's "AI Performance" exports, page by page and query by query, for three sites I own and operate, deliberately different animals: a live, constantly updating tier list for a popular online game; a large library of college financial-aid pages built from one template; and a set of guides for recording family stories. Three audiences, three content shapes, one set of exports. So I joined every cited URL back to the actual source code that produced it, then had two separate AI systems analyze the same exports without talking to each other.[1] Where they agreed, I trusted it. Where they fought, I went back to the raw CSV and settled it. This isn't a roundup of best practices. It's what my own traffic did.
02The page that should have been invisible
Here's what surprised me most. My single most-cited page, the live tier list on the game site, pulled 5,179 citations. That one page is 80.9% of that site's citations and more than half of every page-level citation across all three sites. And it does everything the standard advice says will sink you: zero JSON-LD schema, content rendered entirely in client-side JavaScript so none of its ranked entries appears in the static HTML, and an <h1> that is a plain topic label instead of a phrased question.
By the playbook, that page should be invisible. It's the opposite. Bing executes the JavaScript, reads the rendered tier table, and cites it constantly. The plumbing everyone obsesses over (schema, server-side rendering, question headers) turned out to be a floor, not an engine. You need a page that can be read and lifted cleanly. Past that, the plumbing didn't move the needle.
Citations are power-law concentrated
One page earned 5,179 citations: 57.4% of all 9,019 page-level citations across three sites, and more than every other cited page combined.
| Page | Citations |
|---|---|
| Live game tier list (game site) | 5,179 |
| All 158 other cited pages combined | 2,680 |
| Game patch notes (game site) | 498 |
| Family-story prompts (stories site) | 243 |
| Auto-scholarship guide (aid site) | 218 |
| Top-school aid page (aid site) | 201 |
The more useful comparison is the page that didn't make it. Same site, a gimmick page (a random build generator), with the same domain, the same crawler access, the same internal links, even the same schema, got 14 citations while a sibling page on the same site pulled 498. The difference isn't technical, as far as I can tell. It has no answer in it. No stat, no tier, no fact a model can quote. That's the line: cited pages carry an extractable answer; the ignored one is just a toy.
03The thing I actually changed my mind about: depth
The belief I most wanted to keep was that craftsmanship is the lever: write the longer, richer, more thorough page and the machines reward you. The college financial-aid site let me test it cleanly. It builds 751 pages off one identical template, so I could hold everything constant and ask: does writing a deeper page get it cited?
At first glance, yes. Group the pages into thirds by word count and the citation rate climbs: 6.8% to 10.8% to 12.7%. Looks like a dose-response. Write more, win more. Tidy.
It's a mirage. The moment I split the pages by whether the school has real search demand, the depth effect collapses. High-demand schools get cited 46.6% of the time; low-demand ones, 6.2%. And inside that high-demand group, citation rate actually goes down as the pages get longer: 58.3%, then 45.8%, then 36.0%. The deep pages were getting cited because deep pages happened to be the ones I'd written for prominent schools people search.
Pooled: the mirage
All 751 pages, by length
High-demand only: the truth
Just the schools people search
Citation rate by word-count tertile, same metric and the same scale on both panels (cited %). Pooled, depth looks decisive. Isolate the high-demand pages and it inverts: the longer ones get cited less.
| Word-count tertile | All 751 pages | High-demand pages only |
|---|---|---|
| Shortest third | 6.8% | 58.3% |
| Middle third | 10.8% | 45.8% |
| Longest third | 12.7% | 36.0% |
When I put word count into a regression next to demand, it came out statistically meaningless (odds ratio 1.09, not significant) while demand was the one variable that mattered, more than four times over. The proof is brutal and specific: 84% of my 167 deepest pages got zero citations. One of my deepest pages runs 2,713 words with the full dataset filled in, on a school almost nobody searches, and it has never been cited once, while a thinner page on a school everyone searches gets cited constantly. I wrote a small book nobody asked for.
The whole study in one line
Demand is the lever. Craftsmanship is the cover charge.
One honest footnote: structured richness (the extractable units, the stat blocks, the FAQ) did keep a small real effect, odds ratio about 1.5. So how you structure a page matters a little. That you write more doesn't.
04What I was sure mattered, and didn't
This is the part that should make any "GEO checklist" vendor nervous. I pulled the on-page features everyone sells (quick-answer blocks, tables, FAQ schema, fresh data, source links, "last verified" dates) and compared how often the cited pages and the uncited pages had each one. The uncited pages score higher on nearly every single feature.
| On-page feature | Cited pages | Uncited pages |
|---|---|---|
| Quick-answer block | 65.6% | 99.0% |
| Data tables | 58.3% | 94.9% |
| FAQ schema | 66.3% | 95.9% |
| Fresh / dynamic data | 76.1% | 94.9% |
| Source links | 81.6% | 97.5% |
| "Last verified" date | 83.4% | 95.2% |
Read that table the right way: these features are a floor, not a selector. Ship below it and you risk being fetched and ignored. Clear it and you're merely eligible. Demand is what chooses. And the convergence with the outside research is hard to ignore: a controlled Ahrefs study of 1,885 pages found adding JSON-LD schema produced no meaningful citation uplift,[4] and SE Ranking's analysis of nearly 300,000 domains found llms.txt had zero correlation with citations.[5] My first-party data and their cross-site data point the same direction: the plumbing is hygiene, not the win.
05A few other things the data made me face
Every one of these sites is only capturing about a quarter of the citations available on queries it already shows up for. Even on the queries where my pages already appear, most of the citation pool is going to someone else. Closing that gap on pages I already rank for is often a bigger prize than building new ones.
And speed matters more than I gave it credit for. My fast site started getting cited on the first day of reporting; my slowest site took 64 days to land its first citation. I can't cleanly pull cause from this: the 64 days is measured from the day Bing began reporting, so it tangles a page's age with its indexing speed. But the fast site pings Bing on every publish (IndexNow) and the slow one doesn't, and that's a cheap thing to copy.
06Then I tested two engines I had no data for
Everything above is Bing Copilot. After I'd written it, I ran the same kind of test on Perplexity and ChatGPT Search, 25 queries against the same three sites, and one result stopped me cold.
On Bing, one mid-tier aid page has never been cited, not once. It's one of the hundreds of dead pages in the pile. But Perplexity cited it at position 4 and ChatGPT cited it at position 5, independently, off the exact same template that produces all my Bing zeros. And the kicker: my heaviest Bing page in that set, at 201 citations and indexed clean on Google, was cited by neither of them.
So the two systems are almost the inverse of Bing. Bing rewards the school people actually search; the chat engines reward the template that bothers to spell out the aid tiers when the school's own site is vague about them. The sharpest single number: all four of ChatGPT's citations to that site are pages Bing cited zero times, while my biggest Bing page in that set was cited by neither chat engine.[2]
The cross-engine headline
Bing citations are demand-gated. Cross-engine citations are template-gated. They agree only where both are true.
A page that's dead weight on Bing can be the one carrying you on Perplexity and ChatGPT, and you'd never know if you didn't test both. So far, the only page all three engines agree on is the game tier list, where real demand and a strong template happen to coincide. The template read is the leading inference from a 25-query probe, not a mechanism I can prove.
07The playbook I actually use now
I turned the findings into a decision stack I run before building anything. The one rule that overrides the rest: Gate 1 selects citations on Bing; Gates 2–4 only keep you eligible. Clearing them earns nothing on its own, because the uncited pages clear them too. If Gate 1 fails, stop. Don't build.
Gate 0: Which engine?
Bing Copilot is demand-gated; Perplexity and ChatGPT Search are template-gated. Decide your target before you build, and never use your Bing winners as a map for the chat engines.
Gate 1: Does demand exist?
Find a real grounding query for the exact entity, or use a demand proxy (for colleges, having a Common Data Set entry raised citation odds 7.5×). A small real pool beats a big imaginary one. And don't fragment against aggregate demand: on the game site, 0 of 45 queries named a single sub-entity, so splitting the hub into one page per sub-entity would chase demand that isn't there.
Gate 2: Architecture
An answer-first block rendered in the DOM (this is the differentiator), discrete extractable units (lists, stat blocks, FAQ), and hub-and-spoke links. Client-side rendering is fine as long as the facts actually render. For the chat engines, use a procedural/how-to format, not a bare list, for anything a model would otherwise generate from memory.
Gate 3: Depth
Don't pad word count; it's statistically meaningless here. Do add structured, extractable units: the one depth signal that kept a real (if modest) effect. Depth is polish on a demand-qualified page, never a substitute for demand.
Gate 4: Pre-publish QA
Answer block up top, schema and breadcrumbs present, data in tables or lists, source links, a visible "last verified" date. Pass and you're eligible, not chosen. Remember the table above: your uncited pages pass this too.
08What to stop worrying about
Each of these correlates with citations, and each got cut as non-causal once I controlled for demand:
- JSON-LD schema as a driver. My top two pages have none, for 5,677 citations between them. Keep shipping schema for traditional SEO. Just don't credit it for the AI win.[4]
- Server-side rendering as a requirement. My most-cited page is 100% client-side and earns 5,179 citations. Bing renders it. "Bing can read CSR" is true; "you must SSR" is false.
- Raw depth and word count. Odds ratio 1.09, not significant. It only ever correlated because deep pages happened to be the ones written for high-demand schools.
- Question-shaped H2s. Zero percent of my cited templates use them. It's not a thing my cited pages do.
- Splitting hubs into per-entity pages when demand is aggregate, and sitemap priority: one of my pages carried priority 0.9 and earned zero.
llms.txtas a citation lever. Present on only 3.1% of my cited pages, and SE Ranking found no correlation across ~300,000 domains.[5]
The flip side, and the reason this very report exists: the highest-leverage content lever is original data. The Princeton GEO paper found that adding real statistics measurably lifts how often generative engines cite you.[3] So the most citable thing I can publish about getting cited isn't more advice. It's my own numbers.
09How I know this (and what it can't tell you)
- Data sourceBing Webmaster Tools "AI Performance" exports: daily, per-page, and grounding-query reports.
- SitesThree content sites I own and operate: a live game tier list, a single-template library of college financial-aid pages, and a set of family-story guides.
- Window62–180 days per site, through June 17, 2026.
- Volume~9,500 citations by Bing's daily total; 9,019 at the page level (the two exports don't perfectly reconcile).
- MethodEach cited URL joined to its source code, then re-analyzed independently by two AI systems; a 751-page single-template regression isolated depth from demand.
- Cross-engineA 25-query probe on Perplexity and ChatGPT Search. Directional, not a second dataset.
The honest boundary
This is observational, roughly one page per cell, with demand tangled up in nearly everything. I can describe what got cited. I can't run the clean experiment that proves why. The strongest causal-sounding claims here are really strong associations, and the cross-engine probe is a small exploratory sample. Anyone who tells you they've cracked the causal mechanism off data like this is selling something. Gate 1, demand, is the one thing the data lets me bet on hard.
10FAQ
What matters most for getting cited by AI?
Search demand. Across roughly 9,500 Bing Copilot citations on three sites I own, whether real people searched the thing a page answered was the dominant predictor of citation, at odds ratio 4.42, p=0.003 in a controlled 751-page test. Page quality is a prerequisite you have to clear, not the lever that moves citations.
Does schema markup help you get cited by AI?
Not as a direct driver. My single most-cited page, with 5,179 citations, carries no JSON-LD schema at all, and a controlled Ahrefs study of 1,885 pages found no meaningful citation uplift from adding schema. Keep schema for traditional SEO; don't expect it to win you AI citations.
Does longer or more in-depth content get cited more often?
No, once you control for demand. In a 751-page test built from one identical template, raw word count was statistically meaningless (odds ratio 1.09, p=0.64), and 84% of the deepest pages earned zero citations. A 2,713-word page about a school almost nobody searches was never cited once.
Is server-side rendering required to be cited by AI?
No. My most-cited page renders entirely in client-side JavaScript, and none of its underlying data appears in the static HTML, yet it earned 5,179 citations, because Bing executes the JavaScript and reads the rendered page. The facts must reach the rendered DOM; they do not have to be server-rendered.
Do different AI engines cite the same pages?
Often the opposite. Bing citations are demand-gated; Perplexity and ChatGPT Search citations are template-gated. All four of ChatGPT's citations to one of my sites were pages Bing had cited zero times, while my biggest Bing page on that same site (201 citations) was cited by neither chat engine. Don't use one engine's winners as a map for another's.
Demand decides. Build for the question people are actually asking, make the answer easy to lift, and stop polishing pages the search box has never heard of. If you want help applying this to a site you own, that's the work I do.
Sources & method
- First-party data: Bing Webmaster Tools "AI Performance" exports (daily, per-page, and grounding-query reports) for three content sites I own and operate, through 2026-06-17. Each cited URL was joined to its source code; the exports were independently re-analyzed by two AI systems (Claude and GPT-5.5), and a 751-page single-template regression isolated content depth from search demand. Full method and limitations are in §9 above. ↩
- Cross-engine probe: 25 queries run against the same three sites on Perplexity Pro (Search mode) and ChatGPT Search, one run per engine, June 2026. Exploratory and directional: enough to test whether the Bing findings transfer, not a second dataset. ↩
- Aggarwal, Murahari, et al., "GEO: Generative Engine Optimization", ACM KDD 2024. Adding attributed statistics and quotations measurably increases visibility in generative-engine answers. ↩
- Ahrefs, "We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved." (May 2026): a difference-in-differences study finding no meaningful AI-citation uplift from adding JSON-LD schema across Google AI Overviews, AI Mode, and ChatGPT. ↩
- SE Ranking, "LLMs.txt: Why Brands Rely On It and Why It Doesn't Work" (Nov 2025): analysis of nearly 300,000 domains finding no correlation between
llms.txtand AI citations; corroborated by Google's John Mueller (July 2025): "no AI system currently uses llms.txt." ↩
End of report