Paul Takisaki

About Experience Learn Insights AI Tools Contact Book a call

Lesson 05·GEO Foundations·~9 min·measure → loop

The measurement loop

You're citable (Lesson 2), backed by consensus (Lesson 3), and reachable (Lesson 4). One question is left: are you actually showing up? You can't improve what you don't measure, and AI answers move week to week.

Lesson 5 of 5

In ~9 minutes you'll be able to

Build a fixed prompt set that mirrors how buyers actually ask.
Score the four signals that matter: presence, share of voice, citations, sentiment.
Run a weekly/monthly loop, and read the volatility without panicking.

Short on time? Watch the 60-second version.

The whole lesson in one block:

The answer, up front

You measure AI visibility by running a fixed set of real buyer questions across the engines that matter (ChatGPT, Perplexity, Google AI Overviews, Gemini) in a fresh, signed-out session, and logging four things: were you mentioned (presence), what share of the named options were you (share of voice), which sources got cited, and how you were framed (sentiment). Then you repeat on a cadence and watch the trend. No tool is required to start: a spreadsheet and 90 focused minutes beats a dashboard you never open.

01Why measure at all

Two things are true at once, and you need both in view. AI answers are a real and fast-growing surface, and the underlying data is noisy. Measuring is how you tell signal from noise instead of reacting to a single lucky (or unlucky) screenshot.

≈16%

of Google searches show an AI Overview (after peaking ~25%)^[1]

10×+

growth in AI-referral traffic in 7 months (Adobe)^[2]

~11×

AI visitors convert vs. search (1.66% vs 0.15%)^[3]

But keep the counterweight in mind: AI is still only about 1% of overall web traffic for most sites.^[3] So the honest read is "small, high-quality, growing fast": worth tracking deliberately, not worth panicking over a single bad week. That's exactly what a loop is for.

02The four signals to track

Forget vanity. These four answer "are we in the answer, and how?", and you can score every one of them by hand.

Presence

Of your prompt set, what share named you at all? The simplest, most honest top-line.

log: mentioned ÷ total prompts

Share of voice

When brands get named, what fraction are you vs. 3 to 5 tracked competitors? Travels across time even as wording drifts.

log: your mentions ÷ all brand mentions

Citation share

Which sources did the engine link? Tag each as owned, editorial, UGC, or competitor; it tells you where to work next.

log: the URLs cited, bucketed

Sentiment

When you're named, how? Enthusiastic, with caveats, or dismissed. A mention isn't always a win.

log: positive / neutral / negative

03The loop: a no-tech method

You can run this with a spreadsheet today. The discipline matters more than the tooling: same prompts, same engines, same scoring, every time.^[4]

Write ~30 prompts across three intents. 10 discovery (no brand names: "best X for Y"), 10 comparison (you vs. named rivals), 10 decision ("is X worth it?"). Phrase them the way a real buyer would type. → your prompt set

Pick the engines your buyers use. Usually ChatGPT, Perplexity, Google AI Overviews, and Gemini. Don't measure engines your audience never opens. → coverage

Run it clean. One prompt per fresh conversation, signed out or in a temporary/incognito chat, on the consumer app, not the API (the API runs a different model with search off). → what buyers actually see

Log the four signals + the date. Presence, share of voice, cited sources, sentiment. Paste the full answer so you can re-check it later. → your scorecard

Set a cadence and compare runs. Monthly is the floor; weekly around a launch. Act on the trend across runs, never a single reading. → the loop

04Visibility Scorecard: score one round

Here's a six-prompt set across the three intents. Click each prompt where your imagined brand got named, and watch your presence rate compute: exactly what you'll do for real in a spreadsheet.

Interactive · your feedback loop

Visibility Scorecard

Toggle the prompts where you'd be mentioned. This computes presence rate (one of the four signals) on a sample set; the real method tracks share of voice and sentiment too.

Presence rate (sample)Invisible · 0/6 prompts

Mark every prompt where an engine would name you today. Be honest: guessing high is how people fool themselves into skipping the work. Discovery prompts are the hardest and the most valuable. No brand is named in the question, so getting cited here means the engine reached for you unprompted: pure earned visibility. Discovery, again. If you're invisible on the no-brand "what do people use" questions, that's where the consensus work from Lesson 3 pays off. Comparison prompts test your share of voice. Being in the comparison at all is the win; track what fraction of the named options is you. "Alternatives to a rival" is a gift query. If you're not listed when someone's shopping away from a competitor, that's a fast, concrete gap to close. Decision prompts reveal sentiment. Getting named is good; getting named with caveats ("but reviews are mixed") is a different signal: log how you're framed, not just whether. Specific decision queries are where you win or lose the sale. Long, constrained questions are exactly the fan-out from Lesson 1, and the citable passages from Lesson 2 are what get quoted here.

In the real loop you'd run ~30 of these per engine and re-run monthly. Presence is just the start; share of voice and sentiment tell you the rest.

05Check your understanding

3 quick checks

Click an answer for instant feedback. Nothing is sent anywhere.

Q1You want your measurement to match what buyers actually see. Where do you run your prompts?

Right. The API often runs a different model with web search off, and a logged-in session personalizes results; both distort what a real prospect sees. Fresh, signed-out, consumer surface.^[4]

Q2Your presence rate jumps from 20% to 45% one week, then back to 25% the next. What's the right read?

Exactly. Citation shares swing fast: Reddit's share of ChatGPT citations fell from ~60% to ~10% in six weeks.^[5] One reading is noise; the loop is the signal.

Q3Which set of metrics actually tells you "are we in the answer, and how?"

Yes. Those four are scoreable by hand and map directly to AI answers. Traditional web and SEO metrics don't see the answer box at all.

06The honest caveat

Measure trends, not screenshots The data is genuinely noisy. The same prompt can return different answers minutes apart, and engines are still bad at attribution. When Columbia Journalism Review tested eight AI search tools across 1,600 queries, they were wrong on more than 60%, often confidently.^[6] So never re-architect your strategy off one run. The loop exists precisely to average out the noise.

You can graduate to tools when it's worth it. Platforms like Profound, Otterly, Peec, Ahrefs Brand Radar, and Semrush's AI toolkit automate this across more engines and prompts (useful at scale), but they measure the same four signals you just scored by hand.^[7] Start manual: a loop you actually run beats a dashboard you bought and ignore.

I built a free AI Visibility Check that runs this measurement for mortgage professionals and shows the citations behind each answer.

Course complete

That's the whole loop. You can now see your pages the way an engine does, and act on it:

Lesson 1: how engines retrieve, rerank, and cite passages, not pages.
Lesson 2: engineering the citable passage.
Lesson 3: the off-site consensus AI actually quotes.
Lesson 4: making sure the crawler can reach you at all.
Lesson 5: measuring whether any of it worked, on a loop.

None of it is a one-time fix. Run the loop, find the gap, make one move, measure again. That's GEO as a habit, and it compounds.

Sources

Semrush / Datos, AI Overviews study (10M+ keywords): AI Overviews appeared on ~6.5% of US queries in Jan 2025, peaked ~24.6% in July, and settled ~15.7% by Nov 2025. AI Overviews crossed 2B+ monthly users (Google, July 2025). ↩
Adobe Analytics, "The explosive rise of generative-AI referral traffic" (2025): US AI-referral traffic grew >10× from Jul 2024 to Feb 2025, roughly doubling every 2 to 3 months; some retail verticals grew >30×. ↩
Microsoft Clarity data (1,200+ sites), reported via Digiday, "The state of AI referral traffic in 2025": LLM visitors converted to sign-ups at ~1.66% vs ~0.15% from search; AI is still ~1% of overall web traffic (Conductor/Similarweb), with ChatGPT ~87% of AI referrals. ↩
Manual-measurement methodology: SolCrys, "How to track brand visibility in ChatGPT": fixed 30-prompt set across discovery/comparison/decision intents, run on the consumer surface (not the API) in fresh conversations, scored for mention rate, share of voice, and cited sources. ↩
Semrush, "Most-cited domains in AI search" (230K prompts, 100M+ citations, 13 weeks): ChatGPT's citation of Reddit fell from ~60% of responses in early Aug 2025 to ~10% by mid-September; Wikipedia fell from ~55% to under 20% in the same window. Citation shares now move in weeks. ↩
Columbia Journalism Review (Tow Center), "We compared eight AI search engines. They're all bad at citing news." (2025): across 1,600 queries the tools were collectively wrong on >60%, frequently with unwarranted confidence; several routinely cited fabricated or broken URLs. ↩
Tool landscape (for scaling the loop): see Ahrefs' overview of AI-visibility tools: Profound, Otterly.ai, Peec AI, Ahrefs Brand Radar, and Semrush's AI toolkit track presence, share of voice, citations, and sentiment across multiple engines. They automate the same four signals covered here. ↩

end of lesson 5 · end of course