If you’ve never run a geo holdout incrementality test, your platform-reported ROAS is almost certainly inflated by 30-90%.
That’s not a guess. That’s the average across every brand we’ve tested in the last 18 months. Meta over-attributes by 30-50% on retargeting-heavy programs. Google branded search over-attributes by 60-90%. TikTok view-throughs over-attribute by 40-70%. The numbers in your dashboard are not lies, exactly — they’re optimistic representations of an unknowable counterfactual.
The geo holdout is the only practical way to find out what your channels are actually contributing.
This is the playbook. If you’ve read Your Attribution Data Is Lying To You, this is the deeper how-to.
What you need before you start
- £500k+ annual revenue (smaller brands lack the statistical power for clean reads)
- Operations across multiple geographies (UK regions, EU countries, US states, etc.)
- A reliable single source of truth for revenue (Shopify / Stripe / your accounting system — not a platform dashboard)
- 14-21 days you’re willing to give the test
- Willingness to live with 1-3% revenue variance during the test window
Step 1: Pick the channel to test
Start with the channel that takes the largest share of budget AND that you most assume is driving revenue. For most DTC brands that’s Meta. For most B2B brands it’s Google branded search.
The bigger the assumed contribution, the more there is to learn from the test. Don’t waste your first geo holdout on the channel you already suspect is underperforming — start with the sacred cow.
Step 2: Design the holdout
Three decisions to make:
2a. Holdout vs control split
Standard: 15-25% of comparable markets in the holdout (ads off), 75-85% in the control (ads on as normal). Smaller holdout = lower revenue risk but less statistical power. We default to 20%.
2b. Geographic groupings
You need at least 8-12 comparable markets to build statistically valid groups. For UK-only brands: split by region (NUTS 2). For US: split by DMA or state. For multi-country: pick a set of similar-volume countries.
Match holdout and control on three dimensions:
- Historical revenue per market
- Average order value per market
- Category mix per market
Use the last 6 months of data. Don’t use cherry-picked windows.
2c. Test duration
14-21 days is the sweet spot. Less than 14 days is too noisy (a single bad week skews everything). More than 21 days and you’re losing too much revenue to call it a test.
Avoid promotional windows (Black Friday, big launch, sale event). Test during a normal trading period.
Step 3: Run the test
Day 0:
- In the platform (Meta, Google, etc.), exclude the holdout geos from your active campaigns.
- Note: it’s not enough to pause campaigns. You need to actively exclude the geos so the platform stops bidding there.
- Take a snapshot of all relevant numbers: 7-day rolling revenue per geo, AOV per geo, new customer rate per geo, total marketing spend in test channel.
Days 1-21:
- Don’t touch anything else. Don’t launch new creative. Don’t run a sale. Don’t change pricing.
- Daily: log total revenue, holdout vs control. Don’t make decisions on a single day’s data.
Day 21+1:
- Re-enable the channel in the holdout geos.
- Pull the data.
Step 4: Calculate incrementality
The maths is simpler than you’d expect.
Step 4a: Normalise to revenue-per-geo-per-day
- Holdout total revenue / number of holdout geos / 21 days = avg daily revenue per holdout geo
- Control total revenue / number of control geos / 21 days = avg daily revenue per control geo
Step 4b: Calculate the gap
- Daily revenue gap = control daily revenue – holdout daily revenue
- This is the channel’s incremental daily contribution per geo
Step 4c: Scale to total spend
- Total channel spend in control during the 21 days = X
- Total incremental revenue produced = (daily revenue gap × number of control geos × 21 days)
- Incremental ROAS (iROAS) = total incremental revenue / X
Step 4d: Compare
- Platform-reported ROAS (whatever your Meta/Google dashboard says) vs iROAS (what you just calculated).
- The gap is your over-attribution multiplier.
Step 5: Sanity-check the result
Before you reallocate budget on the back of one test, sanity-check:
- Statistical significance. If your holdout had less than ~£100k of revenue during the test window, the read is noisy. Run again with a longer window.
- External shocks. Did anything happen in the holdout geos that didn’t happen in control? Weather, news, competitor activity, distribution issues.
- Spillover. If your customers move between geos (e.g. London commuters), spillover dilutes the test. Add a buffer week between data periods or use bigger geographic units.
- Replication. One test is suggestive. Two tests in a row producing similar iROAS is convincing.
Step 6: Reallocate with confidence
If iROAS is materially below platform ROAS (the most common outcome):
- Reduce spend in that channel by 20-30%.
- Reallocate to channels you haven’t yet tested.
- Re-test the original channel in 90 days at the lower spend level — the over-attribution often shrinks at lower spend (you’re no longer paying for as much overlap).
If iROAS is roughly in line with platform ROAS (rare but happens — usually for non-retargeting prospecting on Meta or non-brand search on Google):
- You can confidently scale this channel.
- Re-test annually to catch drift.
Common mistakes
- Mistake 1: Running the test during a promotional window. The data is unusable.
- Mistake 2: Pausing campaigns instead of geo-excluding. Some platforms keep bidding via cached audiences for days.
- Mistake 3: Comparing platform-reported revenue, not total business revenue. Circular.
- Mistake 4: Drawing conclusions from a single 7-day test. Too noisy.
- Mistake 5: Running the test on a channel that takes <5% of spend. Not enough signal.
- Mistake 6: Not snapshotting baseline numbers before Day 0. Can’t compare cleanly.
What to do with the result internally
Brace yourself. The result will probably embarrass someone.
If your Meta agency has been claiming 4x ROAS and the test shows iROAS of 1.8x, the conversation is going to be uncomfortable. The iROAS isn’t a judgement of the agency’s quality — it’s a judgement of how much over-attribution exists in the platform’s modelled view.
The mature way to handle this:
- Frame the test as a baseline you’ll repeat annually.
- Reset shared expectations — agencies should be optimising for iROAS or net new customers, not platform ROAS.
- Use the result to inform budget allocation, not to fire people. The agency probably did fine work — the platform was misleading both of you.
The 30-second summary
- Geo holdout test = ads off in 15-25% of comparable markets for 14-21 days.
- Compare control vs holdout total business revenue (not platform-attributed).
- Calculate iROAS = (control revenue – holdout revenue) / spend in control.
- Gap between platform ROAS and iROAS = over-attribution multiplier.
- Most channels show 30-90% over-attribution. Reallocate budget accordingly.
- Re-test annually. The picture changes as your channel mix and audience evolve.
FAQ
How much revenue do I lose during a geo holdout test?
If iROAS in the holdout channel is 2.0x, you lose roughly (channel spend × 2.0) in the holdout geos during the test period — typically 1-3% of total business revenue across a 21-day window if the holdout is 20% of geos. Most brands recoup this within a quarter via better budget allocation post-test.
What if I only operate in one country / one region — can I still run a geo holdout?
Yes — split by sub-regions (UK by NUTS 2, US by DMA, EU by country). Single-city brands can run user-level holdouts using customer-list exclusions, but those are noisier and harder to design well. Get bigger geographically before relying on this method.
Can I run a geo holdout on TikTok or YouTube?
Yes for both. TikTok exclusions are slightly less granular than Meta’s geo controls; verify in the campaign setup. YouTube via Google Ads supports DMA-level geo targeting and exclusion.
What’s the difference between a geo holdout and a Meta conversion lift study?
A Meta conversion lift study (the platform’s built-in version) randomises individual users into test/control. It’s faster but only measures within-Meta effects and uses Meta’s modelling. A geo holdout measures total business effect, channel-agnostic. Both are valid; geo holdouts are more trustworthy when the question is “is this channel earning its budget”.
How often should I re-test?
Annually for top-spend channels, more often after major attribution changes (iOS update, platform algorithm shift, large change in channel mix). Quarterly is overkill unless something dramatic has shifted.
If you’d like a second pair of eyes on your test design before you press go, request a free 15-minute Loom audit via the contact page. We’ll walk through your specific test plan, flag risks, and suggest improvements.


