Try for free
Meta Ads · TikTok Ads · Creative testing frameworks

Stop guessing what creative testing framework to run.

The creative testing methods media buyers actually use across apps, e-commerce and mobile games. Automate the one you pick and make your ads perform better.

  • Each card: setup, pause criteria, budget floor & the common mistake
  • Free, no email gate — automate any method when you’re ready
14
Methods
2
Platforms
20+
Auto-rules
Free
No gate
Why creative testing keeps breaking

Creative testing is brutal on every front.

“Added new ads to a profitable account. Dropped it overnight — 1.38 → 0.75 ROAS, $18k and 30 days to recover.”

Winners are rare — everything rides on them.

Most creatives never win. A handful carry the whole account, then fatigue in 7–14 days.

Which method even works now?

Andromeda rewrote Meta overnight. TikTok plays by other rules. Nobody knows how to test creatives in 2026.

Every test eats a day by hand.

Set up, launch, babysit, score, log — by hand, for every test. Skip the last step and the lesson's gone.

The playbook

14 frameworks that ship winners
Find to yours.

Setup, pause thresholds, budget floor, common mistake — the full playbook per method.

14 production-grade methods.

3-3-3, Hooks Test, Multi-Variant Battery, Mirror-BAU, Cheap Geo, Conversion Lift — what real media buyers run.

Free · no email gate

Filtered to your reality.

Chip filters by goal, budget, platform. No theoretical tests that won't fit your account.

Match in 30 seconds

Every card is runnable today.

Setup, pause thresholds, budget floor, common mistake — full playbook per method.

Manual or automated · your call
Your framework · our engine · zero ops

Pick a framework. Set it on autopilot. Faster winners. Hours back.

Set thresholds once. Launch · pause · promote · refresh · digest — runs automatically for every method.

Automatic, always.

Pause weak. Clone winners 2×. Refresh fatigue. Overnight.

20+ rules out of the box

First winners in 72 hours.

Worst-performer pause Day 3. Top-2 promoted Day 5. No waiting three weeks for signal.

Same gates · every test

Iterate weekly, nothing drops.

Refresh cadence built in. Andromeda gets its 20–30 fresh variants. Nothing rots in BAU.

Frequency · CTR · CPA · hook-rate · 24/7

Hours back from Ads Manager.

Stop logging in at midnight to pause an ad set. Spend the hours on new hooks, offers, angles.

One Slack digest · three dashboards gone
↳ Two ways to start
↳ One week on autopilotYou set thresholds once · Scalemate does the rest
  1. Mon09:00
    Bulk launch30 creatives from Drive · auto-named
  2. Mon09:42
    First test liveall ad sets running
  3. Tue
    Auto-pausead sets below CTR threshold
  4. WedDay 3
    Worst-performer pauseCPA > 2× target
  5. ThuDay 4
    Top performers rankedby CPA per ad set
  6. FriDay 5
    Winner promotedauto-clones into BAU at 2× budget
  7. Satongoing
    Refresh monitorwatches frequency · CTR · CPA
  8. Daily09:00
    Slack digestpaused · promoted · spend

All 14 creative testing methods

Production-tested creative testing methods for Meta + TikTok — sorted by goal, budget and platform. Pick one, run it by hand, or put it on autopilot.

Find winners fast
MetaTikTok

The 3-3-3 Method

Your first winners in 72 hours. 3 ad sets × 3 creatives × 3 days — the fast, cheap way to find a starter winner. Works across subscription apps, e-commerce and gaming, at any account size.

Full playbook — setup, thresholds & common mistake
Best for

Meta & TikTok · subscription apps, e-commerce, gaming · $40K+/mo accounts.

Setup
  1. 3 distinct concepts — different hooks/angles, not 3 cuts of one ad.
  2. 3 ad sets, identical targeting + budget ($50–150/day each), 1 ad each.
  3. Run 72h, no edits.
  4. Day 4: pause ad sets at CPA > 1.5× target; scale the winner.
Strengths

Fast · cheap · forces 3 distinct concepts.

Trade-offs

Noisy under $100/day. 3 days = before learning phase exits. Tests creative only, not audience fit.

Common mistake

3 versions of one ad isn't 3-3-3 — variation's too narrow and Andromeda picks the wrong winner.

On autopilot with Scalemate
  1. Template — save a 3-3-3 structure: 3 ad sets, 1 ad each, identical targeting + budget.
  2. Bulk launch — swap in 3 Drive creatives, launch all 3 ad sets in 2 clicks.
  3. Auto-pause — Day 4, cut any ad set at CPA > 1.5× target.
  4. Auto-promote — winner clones into your scaling campaign at 2× budget.
  5. Slack — daily status + winner called on Day 4.
Budget$40K+/mo
Duration3d
Creatives9
Setup10 min
3 concepts · 72h · CPA gate
Bulk launch3 ad sets
Concept A1 ad
Concept B1 ad
Concept C1 ad
Run 72 hoursno edits
CPA ≤ 1.5× target?evaluated day 4
Pause setloser
Scale 2×winner
Slack the teamwinner ID + scale plan
Log to Google Sheetsassign test status · build the library
Andromeda-ready
MetaTikTok

Multi-Variant Battery (Andromeda Native)

Stop fighting the algorithm. Load 7–20 creatives — different formats and concepts — into one broad ad set and let Andromeda find the pockets.

Full playbook — setup, thresholds & common mistake
Best for

Shopping/Sales · broad audiences · post-Andromeda accounts.

Setup
  1. 1 broad campaign, demographics-only targeting.
  2. Single ad set, single objective.
  3. Load 7–20 creatives — different formats and concepts.
  4. Phase 1 — at 48h, pause any creative with CPA > 1.5× target.
  5. Phase 2 — at 72h, scale survivors with CPA ≤ target (2× + add to BAU); pause the rest.
Strengths

Algorithm-aligned. No manual audience-creative matching. Scales — more creatives = more pockets.

Trade-offs

Needs 7–20 creatives. Hard to explain wins to stakeholders. Hard to attribute to specific choices.

Common mistake

Loading 20 near-identical creatives. Andromeda needs real diversity (format + concept + angle), not 20 carousels.

On autopilot with Scalemate
  1. Template — broad targeting, single ad set + objective.
  2. Bulk launch — drag the full batch; all creatives load into one ad set.
  3. Phase 1 auto-pause — cut any creative with CPA > 1.5× target at 48h.
  4. Phase 2 auto-promote — scale 2× + clone to every BAU campaign for survivors with CPA ≤ target at 72h.
  5. Slack alert — new winner posted the moment a creative passes Phase 2.
Budget$10–40K/mo
Duration3d
Creatives14
Setup15 min
Broad campaign · 7–20 creatives · 2-phase CPA gates
Broad campaign1 ad set · single objective
Load 7–20 creativesdifferent formats × concepts
Andromeda allocatesfinds the pockets
Phase 1 · 48hCPA check
Pause creativeCPA > 1.5× target
Continue → P2CPA in range
Phase 2 · 72hwinner check
PauseCPA > target
Scale 2× + BAUSlack: new winner ✓
Find winners fast
MetaTikTok

The 3-2-2 Method (5-Day Sprint)

A scaling-ready verdict in 5 days, not 14. Best when you're testing communication hypotheses in your creative — 3 different angles, 2 variations each.

Full playbook — setup, thresholds & common mistake
Best for

Testing creative communication hypotheses · 3 angles × 2 variations · a primary + backup winner in 5 days · $40K+/mo accounts.

Setup
  1. 3 ad sets × 2 creatives, 5-day run.
  2. Identical audience + budget ($150–300/day each — higher budget compresses signal).
  3. 2 creatives compete inside each ad set.
  4. Day 5: pick winner per ad set by CPA; move top + #2 to scaling.
Strengths

Faster than the 14-day version. In-ad-set variation suits Andromeda.

Trade-offs

Tighter signal = more noise. Needs a daily budget of at least 3× your target CPA per ad set.

Common mistake

Throwing random ads into each ad set. Each ad set should test one hypothesis — e.g. same angle, different CTAs — so you learn why the winner won, not just which ad won.

On autopilot with Scalemate
  1. Template — 3 ad sets, 2 ads each, $150–300/day, identical targeting.
  2. Bulk launch — load 6 Drive creatives (2 per ad set) in 1 click.
  3. Auto-pause — Day 3, cut the worst ad sets at CPA > 1.5× target.
  4. Day 5 — Scalemate surfaces winner + #2 per ad set automatically.
  5. Auto-promote — top winner + backup clone to scaling at 2–3× budget.
Budget$40K+/mo
Duration5d
Creatives6
Setup15 min
5-day verdict · worst-performer pause · top 2
Bulk launch3 ad sets · 6 ads
Ad set A2 ads
Ad set B2 ads
Ad set C2 ads
Day 3 — worst-performer pausepause if CPA > 2× target
Worst pausedremoved day 3
Survivors run onto day 5
Day 5 — rank top 2best CPA per ad set
Scale best adown ad set · 2–3×
Keep 2nd bestfallback if leader tires
Add top 2 → BAUinto all active campaigns
Find winners fast
MetaTikTok

Hooks Test

Hook rate decides what spend a creative gets early — the body decides what converts to purchase. Hold a body that already converts, swap hooks, scale the one that unlocks volume.

Full playbook — setup, thresholds & common mistake
Best for

Video-heavy teams (mobile UA, DTC) · refining a proven concept · $5K+/mo accounts.

Setup
  1. Take one winning video.
  2. Make 3–5 variants — different first 3 seconds only.
  3. Keep body + end identical.
  4. Run in an Advantage+ CBO campaign — 1 ad set per hook, 1 ad each.
  5. After 1K+ impressions/variant, cut bottom 50% by hook-rate.
Strengths

Isolates the hook cleanly. Compounds a proven concept. Fast once you have a base winner.

Trade-offs

Needs editing capacity. Refines, doesn't discover. Hook-rate ≠ ROAS.

Common mistake

Comparing hook rates across different campaigns instead of within one test — hook-rate is context-dependent.

On autopilot with Scalemate
  1. Template — Advantage+ CBO campaign, 1 ad set per hook, 1 ad each, primary metric = hook rate.
  2. Bulk launch — drag 5 Drive variants into the ad set, launch.
  3. Auto-pause — cut any variant under control × 1.2 hook-rate after 2K impressions.
  4. Winner — reporting tags the top hook for the next batch.
  5. Next cycle — Slack alert when one variant clears 1.5× control to prompt iteration.
Budget$5K+/mo
Duration7d
Creatives5
Setup20 min
1 body · 5 hook variants
Winning conceptkeep body, swap hook
5 hook variants · same body
Hook A
Hook B
Hook C
Hook D
Hook E
Advantage+ campaign · CBO1 ad set · 1 ad per hook
Auto-pause rulehook-rate > 1.2× control · CPA ≤ target
Variant pausedweak hook cut
Scale winner≥1.5× control by hook-rate
Validate before scaling
Meta

Meta Conversion Lift Test

Settle 'did the ad actually cause that sale?' with finance. A true hold-out measures incremental conversions — but it's gated to enterprise teams with a Meta rep and $30K+ for the study.

Meta-gated · measurement-only — the Lift study runs through Meta, not Scalemate.

Full playbook — setup, thresholds & common mistake
Best for

Enterprise w/ Meta rep · $75K+/mo accounts · validating a big change before scaling 2–3×.

Setup
  1. Request a Lift study via your Meta rep.
  2. Meta splits users into test (sees ads) vs hold-out (doesn't).
  3. Run 4+ weeks for a valid sample.
  4. Define the conversion event.
  5. Meta reports incremental conversions + cost per incremental conversion.
Strengths

Highest-quality incrementality Meta offers. Separates 'ad drove this' from 'would've converted anyway.' Ends attribution debates.

Trade-offs

Gated — needs a Meta rep. 4+ week cycle. $30K+ for the study. One decision at a time, not a daily tool.

Common mistake

Assuming it's the same as the A/B Test option. A/B compares two live variants; Lift uses a true hold-out. A/B can't measure incrementality.

On autopilot with Scalemate
  1. Pre-Lift — Scalemate locks current variants (no rotation during the 4-week window).
  2. During — dashboard reports CPA + ROAS in parallel so finance sees both.
  3. Post-Lift — flip 'scaling mode' once incrementality is confirmed.
  4. Auto-scale — +20%/day when ROAS > target × 1.1, auto-revert on a 15% drop.
  5. Slack — daily scaling-action summary for team + finance.
Budget$75K+/mo
Duration28d
Creatives
SetupBy Request
Hold-out study · 4 weeks · incrementality
Meta rep requestLift study via your rep
Test groupsees ads
Hold-outno ads — control
Run 4+ weeksfor valid sample
Define eventincremental conversions
No liftdo not scale
Lift confirmedscale 2–3×
Validate before scaling
MetaTikTok

Bulk Creative Test in CBO (2-Phase Funnel Progression)

Test 12–75 creatives a cycle without funding duds. Phase 1 cuts the bottom 50–70% on cheap signal; Phase 2 validates survivors on revenue. For R&D + AI-creative pipelines.

Full playbook — setup, thresholds & common mistake
Best for

High-volume R&D teams · 30+ creatives/cycle · AI-generation pipelines · $10K+/mo accounts.

Setup
  1. 1 CBO campaign, 4–5 ad sets (broad or diversified lookalikes).
  2. 3–15 ads per ad set (12–75 total).
  3. Phase 1 (3–5d): let CBO allocate; cut bottom 50–70% on upper-funnel signal (installs / registrations / add-to-cart).
  4. Phase 2 (3–7d): keep top 20–40%; judge on lower-funnel (purchases / trials / paying users).
  5. Move final 5–10 winners to scaling.
Strengths

Tests volume efficiently (12–75 vs 9). 2-phase saves budget. Built for AI creative pipelines. Matches how CBO allocates.

Trade-offs

Needs 12–75 creatives. Phase 1 metric choice is critical — wrong signal cuts real winners. 6–12 day cycle.

Common mistake

Treating Phase 1 metrics as final. High install rate ≠ high purchase rate. Phase 2 exists to weed those out — don't skip it.

On autopilot with Scalemate
  1. Template — CBO campaign, 4–5 ad sets, single objective.
  2. Bulk launch — drag 12–75 Drive creatives; Scalemate spreads 3–15 per ad set.
  3. Phase 1 auto-pause (Day 3) — cut bottom 50–70% on upper-funnel events.
  4. Phase 2 auto-pause (Day 5–8) — cut survivors failing lower-funnel CPA.
  5. Auto-promote — final winners clone to scaling at 2–3×; batch tagged 'winner library.'
Budget$10K+/mo
Duration12d
Creatives44
Setup30 min
CBO · 12–75 ads · 2-phase funnel
CBO campaignauto-budget · 4–5 ad sets
Adset 13–15 ads
Adset 23–15 ads
Adset 33–15 ads
Adset 43–15 ads
Phase 1 · day 3upper-funnel signal
Cut 50–70%weak upper-funnel
Top 30–40%advance to P2
Phase 2 · day 5–8revenue signal
Cut on revenueno purchases
Scale 2–3×5–10 final winners
Cut losers fast
MetaTikTok

Static vs Video Test

Stop guessing whether your offer wants video or static. 3 ad sets — static / video / mixed, same concept — tells you where to point production budget.

Full playbook — setup, thresholds & common mistake
Best for

New accounts · format transitions · vertical-specific calls (mobile UA, eCom Reels).

Setup
  1. 3 ad sets: static-only, video-only, mixed.
  2. Same concept across all 3.
  3. Same audience + budget ($100–200/day each).
  4. Run 7–14 days.
  5. Compare CPA, CTR, hook rate, watch-through by format.
Strengths

Ends the 'video always wins' assumption. Data-backed format strategy.

Trade-offs

Needs both formats produced. Result varies by funnel stage (TOFU = video, BOFU = static).

Common mistake

Calling video the winner on engagement when static had better CPA. Optimize for the business metric, not engagement.

On autopilot with Scalemate
  1. Template — 3 ad sets (static / video / mixed), identical targeting + budget.
  2. Bulk launch — drag 6 static + 6 video + 6 mixed; 6 per ad set in 1 click.
  3. Auto-pause — cut anything under 0.5% CTR after 500 impressions.
  4. Format reporting — dashboard shows aggregate CPA per format, not just per ad.
  5. Auto-promote — winning format's top 2 clone to scaling; lock the format default.
Budget$10–40K/mo
Duration5d
Creatives18
Setup20 min
Same concept · 3 formats · CTR + CPA gate
Launch — same concept3 formats · equal budget
Static only6 ads
Video only6 ads
Mixed6 ads
Auto-pause adsCTR < 0.5% · 500 impr
Format reportingaggregate CPA per format
Scale top 2clone winning format
Lock formatas default
Andromeda-ready
MetaTikTok

Creative Refresh Cadence

Catch fatigue before it eats your CPA. Watch frequency, CTR, CPA and hook-rate on every live creative and swap the moment one breaks. Mandatory post-Andromeda — cycles dropped to 7–14 days.

Full playbook — setup, thresholds & common mistake
Best for

Any team running winners 30+ days · critical post-Andromeda.

Setup
  1. Track per-creative weekly: frequency, CPA, CTR, hook rate.
  2. Replace when frequency > 3.0, CTR −20% from peak, CPA +30%, or hook-rate −15% (video).
  3. Swap in a variation of the winner, not a brand-new concept.
  4. Pause the fatigued creative immediately.
Strengths

Stops budget waste on dying creative. Holds campaign CPA. Forces a refresh rhythm.

Trade-offs

Needs a steady creative pipeline. Maintenance, not discovery.

Common mistake

Replacing with a new concept instead of a variation — fresh concepts need their own test; a variation inherits 70–80% of the winner.

On autopilot with Scalemate
  1. Always-on monitor — watches every creative for fatigue triggers.
  2. Slack — instant DM with creative ID + which metric broke.
  3. Auto-pause — cuts the fatiguing creative the same hour.
  4. Auto-swap — pulls the next variant from a tagged Drive queue. Zero campaign holes.
  5. Weekly — which creatives fatigued, replacement performance, cycle time.
Budget<$10K/mo
Duration14d
Creatives
SetupOnce
Always-on monitor · 4 fatigue triggers
Always-on monitorevery live winner · 24/7
Freq > 3.0
CTR −20%
CPA +30%
Hook −15%
any one fires
Slack DMcreative ID + metric
Auto-pausecuts fatiguing creative
Auto-swap variationnext from tagged queue
Weekly reportfatigued · repl · cycle time
Controlled comparison
MetaTikTok

Control Ad Test (Equal Impressions vs Winner)

The cleanest yes/no on whether a challenger should replace your champion. Equal impressions, then stop — no time or audience bias. Compare by IPM (app) or CVR (web).

Full playbook — setup, thresholds & common mistake
Best for

Teams with an established winner · iterating a replacement · app teams (IPM), web teams (CVR).

Setup
  1. Use your proven winner as control.
  2. 2 ad sets (identical audience) OR 1 ad set with 2 ads, standard delivery (not Dynamic Creative).
  3. Run until both hit the same impressions (5K / 10K / 20K).
  4. Pause both at target — don't let the leader run on.
  5. Wait 24–48h for attribution, compare by IPM / CVR / lead CPA.
Strengths

Cleanest head-to-head. Controls time-of-day, day-of-week, audience drift. Equal sample sizes.

Trade-offs

Needs manual impression monitoring. Attribution lag delays the call. Not parallelizable (5 challengers = 5 runs).

Common mistake

Stopping one at target but letting the other run 'a bit longer' — re-introduces time bias. If one lags, raise its budget to catch up.

On autopilot with Scalemate
  1. Template — 2 ad sets, or 1 ad set with 2 ads (no Dynamic Creative).
  2. Equal-impressions auto-pause — both pause within minutes of the target, no manual watching.
  3. Attribution lag — reporting holds 48h post-pause so late conversions count.
  4. Decision metric — IPM (app) / CVR (web) / lead CPA per ad.
  5. Winner — at ≥1.1× control, clone to BAU + auto-pause the old control.
Budget$10–40K/mo
Duration4d
Creatives2
Setup15 min
Champion vs Challenger · equal impressions
Championproven control
Challengernew variation
Equal impressionsboth stop at 5K / 20K
Hold 48hattribution lag
Challenger ≥ 1.1× control?on decision metric
Champion stayschallenger dropped
Challenger winsclone to BAU · retire control
Controlled comparison
Meta

Meta Native A/B Test

Let Meta do the stats. The native Experiments tool splits the audience, runs the test and declares a winner with confidence. For teams without a data analyst.

Full playbook — setup, thresholds & common mistake
Best for

Teams wanting native attribution + Meta-handled splitting + a clear verdict, no manual setup.

Setup
  1. Ads Manager → Experiments → A/B Test.
  2. Pick 2–4 ads / ad sets / campaigns.
  3. Set duration (7–14d min) or a spend cap per variant.
  4. Meta splits the audience — no overlap.
  5. Judge by your chosen metric: hook rate, CTR, CVR, or CPA.
Strengths

Meta handles the math. Auto-splits to remove overlap bias. Clear 'Variant X won, Y% confidence.'

Trade-offs

7-day min even for small tests. Only 2–4 variants. Meta optimizes for confidence, not always your metric.

Common mistake

Setting the wrong primary metric — pick CTR and Meta crowns the high-CTR variant even if its CPA is worse. Set primary = the metric you'll act on.

On autopilot with Scalemate
  1. Parallel monitor — Scalemate tracks CPA + ROAS + hook rate per variant alongside Meta.
  2. Slack on verdict — Meta's call + Scalemate's full breakdown, side by side.
  3. Disagreement check — flags when Meta's winner conflicts with your business KPI.
  4. Auto-promote — if both agree, clone the winner to scaling.
  5. Loser archive — losing variants tagged in Drive for future learning.
Budget$10–40K/mo
Duration10d
Creatives4
Setup10 min
Meta Experiments · native split · 7–14d
Ads Manager → ExperimentsA/B Test
Variant A
Variant B
Variant C
Variant D
2–4 ads / ad sets
Meta splits audienceno overlap · 7–14 days
Scalemate parallelCPA · ROAS · hook rate
DisagreementMeta winner ≠ your KPI
Both agreeauto-promote winner
Mobile UA (cost-optimized)
MetaTikTok

Cheap Geo / WW Testing

Test 4–10× more creatives per dollar. Run early tests in T3 geos or WW MAI where CPI is a fraction of T1, then promote winners home. Mobile UA only — and only after correlation is proven.

Full playbook — setup, thresholds & common mistake
Best for

Mobile UA · $1K+/day per geo · high creative volume · proven cheap-geo→T1 correlation.

Setup
  1. MAI campaign in T3 geos (Indonesia, Philippines, Brazil, Vietnam) or a WW MAI biased to cheap inventory.
  2. Run creatives at 4–10× lower CPI than T1.
  3. Judge by IPM + CPI.
  4. Promote winners to a T1 AEO/ROAS campaign.
  5. Small accounts (<$1–2K/day): move winners straight to BAU.
Strengths

4–10× cheaper testing. More creatives per dollar. Fast install volume = fast signal.

Trade-offs

Only works when cheap-geo winners correlate with T1. Strong for utilities/casual games, weak for premium/payment-heavy apps.

Common mistake

Promoting T3 winners to T1 without validating correlation. Some win cheap on low-intent T3 audiences and flop in T1. Validate the top 5 first.

On autopilot with Scalemate
  1. Templates — T3/WW MAI test + T1 AEO/ROAS scaling, reusable.
  2. Bulk launch in T3 — drag the batch; cheap impressions start flowing.
  3. Auto-pause (Day 2) — cut creatives under IPM threshold; T3 signal is fast.
  4. Correlation gate — one-time manual check: top 5 T3 winners vs T1 baseline.
  5. Auto-promote — validated winners clone T3 → T1 with budget scaling.
Budget$10–40K/mo
Duration6d
Creatives40
Setup30 min
T3/WW MAI · IPM gate · validate to T1
T3 / WW MAI campaignIndonesia · Philippines · Brazil
Day 2 auto-pausecut below IPM threshold
Correlation gatemanual: top 5 T3 vs T1 baseline
T3 ↔ T1 correlation OK?before mass promotion
Reject T3 wincheap-geo only
Promote → T1AEO/ROAS scaling
Mobile UA (cost-optimized)
MetaTikTok

Cheap Geo + AEO (Combined)

Cheap-geo prices, AEO-quality signal. Filter creatives on the event that actually matters (d3 retention, first purchase, level-5) without paying T1 CPIs. Needs 50+ AEO events/week.

Full playbook — setup, thresholds & common mistake
Best for

Mobile UA w/ proven cheap-geo AEO correlation · enough conversion volume for AEO learning.

Setup
  1. AEO campaign in T3 cheap geos.
  2. Pick an AEO event tied to monetization (d3 retention, 1st purchase, level-5, 1st top-up).
  3. Test creatives; score on AEO event rate, not just IPM.
  4. Promote winners to a T1 AEO campaign, same event.
Strengths

Cost savings + signal quality. AEO outcomes mean more than raw installs.

Trade-offs

AEO needs volume — under ~50 events/week the learning phase never closes. Validate event throughput first.

Common mistake

Running AEO on thin volume. ~50 events/week minimum; at 10/week you get random delivery, not optimization.

On autopilot with Scalemate
  1. Template — cheap-geo AEO campaign tied to your monetization event.
  2. Bulk launch — load the batch into the AEO campaign in cheap geos.
  3. Event-volume safety — auto-pause the campaign if AEO events < 50/week (switch to MAI Method 11).
  4. Auto-pause — cut creatives under event-rate threshold after 5–7 days.
  5. Auto-promote — validated winners clone to T1 AEO, same objective.
Budget$40K+/mo
Duration7d
Creatives25
Setup45 min
T3 AEO · event-volume safety · promote
T3 AEO campaigntied to monetization event
AEO events ≥ 50/wk?learning-phase safety
Auto-pause campaignswitch to MAI (method 11)
Volume OKcontinue testing
Day 5–7 cutbelow event-rate threshold
Promote → T1 AEOsame monetization event
Mobile UA (cost-optimized)
MetaTikTok

Mirror-BAU Testing

Test creatives in the exact conditions they'll have to survive. Clone your BAU campaign, swap in 2–4 new creatives, run 5–7 days in parallel. Priciest per test, most reliable signal — no T3→T1 gap.

Full playbook — setup, thresholds & common mistake
Best for

Mature mobile UA where cheap-geo correlation failed · creatives that must work in production.

Setup
  1. Duplicate your BAU campaign exactly — optimization, audience, placements, geos.
  2. Swap in the new creatives only.
  3. Run 5–7 days alongside BAU.
  4. Compare by BAU metric (IPM, CPI, AEO event rate, ROAS).
  5. Winners replace fatiguing BAU creatives.
Strengths

Most reliable signal of any UA method. Winners behave the same when scaled. No correlation gap.

Trade-offs

Priciest per creative — full T1/BAU CPI. Not for high-volume iteration (cost scales linearly).

Common mistake

Testing too many at once dilutes BAU and spreads spend thin. Limit to 2–4 per mirror cycle.

On autopilot with Scalemate
  1. One-click BAU clone — exact optimization, audience, placements, geos. No config drift.
  2. Bulk swap — drag 2–4 new Drive creatives, replace only the creatives.
  3. Auto-pause (Day 3) — cut new creatives under BAU baseline × 1.3 CPI.
  4. Day 5–7 — per-creative CPA + AEO rate vs BAU baseline.
  5. Auto-replace — winners clone into BAU; fatigued BAU creative auto-paused.
Budget$40K+/mo
Duration7d
Creatives4
Setup20 min
Clone BAU · swap creatives · run parallel
Clone BAU campaignsame audience · same placements
Swap 2–4 creativesfresh challengers only
Run 5–7 days parallelvs. live BAU
Beats BAU CPA?side-by-side compare
Discard challengerBAU continues
Promote to BAUswap into live
Find winners fast
MetaTikTok

CBO Spend-Gated Test (1 Ad/Set)

Isolate every variant in its own ad set. CBO allocates spend; pause gates fire at $60 (CPI check) and $150 (CPA check). Survivors scale 1.5× and push as new ads into every BAU campaign.

Full playbook — setup, thresholds & common mistake
Best for

Teams wanting isolated $-gated reads · clean per-variant signal · no in-adset competition.

Setup
  1. 1 CBO campaign · 1 ad set per variant · 1 ad each.
  2. Phase 1 — at $60 ad-set spend, pause if CPI > 1.5× target.
  3. Phase 2 — at $150 ad-set spend, pause if CPA > 1.5× target.
  4. Survivors: scale 1.5× budget + clone as new ad into every active BAU campaign.
Strengths

Clean per-variant read — no ads competing inside an ad set. Spend-based gates adapt to account pace. Survivors land in BAU as proven, not theoretical.

Trade-offs

More ad sets = more learning phases to feed. CBO may underspend low-CTR variants before they hit Phase 1 gate. Fights Andromeda's 'feed many in one' instinct.

Common mistake

Setting CPI/CPA targets too tight. Use the BAU running average × 1.5, not an aspirational target — otherwise you cut variants that just needed a wider audience pocket.

On autopilot with Scalemate
  1. Template — CBO campaign, 1 ad set per variant, 1 ad each.
  2. Bulk launch — drag N creatives from Drive; Scalemate auto-creates N ad sets + 1 ad per set.
  3. Phase 1 auto-pause — cut when adset spend ≥ $60 AND CPI > 1.5× target.
  4. Phase 2 auto-pause — cut when adset spend ≥ $150 AND CPA > 1.5× target.
  5. Auto-promote — survivor clones at 1.5× budget + adds as new ad to every active BAU campaign.
Budget$10K+/mo
Duration5-7d
Creatives8
Setup15 min
1 ad/set · $60 + $150 spend gates · BAU sync
CBO campaign1 ad set per variant · 1 ad each
Adset 11 ad
Adset 21 ad
Adset 31 ad
Adset 41 ad
Phase 1 · $60 spendCPI check
Pause adsetCPI > 1.5× target
Continue → P2CPI in range
Phase 2 · $150 spendCPA check
Pause adsetCPA > 1.5× target
Scale 1.5× + BAUclone as new ad to all BAU

How Scalemate’s automated creative testing platform runs any method in the library

One engine behind every method — you pick the template, the platform runs the test. Most creative testing tools and ad creative testing platforms stop at a dashboard; this one launches the batch, cuts losers, scales winners, then loops results back so each cycle runs faster and bigger.

  1. 1

    Pick a method template

    Preset structure — the exact ad set count, budget split, and audience for each method.

  2. 2

    Bulk launch from Google Drive

    Drag 30 creatives at once — auto-named and pushed into the right ad sets.

  3. 3

    Auto-pause on the method's schedule

    Losers paused on threshold (CPA, ROAS, IPM) without you logging in.

  4. 4

    Winners auto-clone to scaling

    Hit the promotion threshold and the creative duplicates into your scale campaign automatically.

  5. 5

    Slack summary every morning

    Paused, promoted, burning budget — every test's state, without opening Ads Manager.

  6. 6

    Sync results back to your own stack

    Connect Scalemate to your system and push every test result — winners, losers, hook rates — straight into your database or creative pipeline via API. Your generation tool spins up the next batch from what actually won, and the loop runs itself.

Frequently asked questions

Creative testing — also called ad creative testing or creative ad testing — is the process of running multiple ad creatives against each other to find which ones drive the cheapest, highest-quality conversions before you put real budget behind them. On Meta and TikTok it means structured tests like 3-3-3, Hooks Tests, creative A/B testing, dynamic creative testing and multi-variant batteries, judged on CPA, ROAS, IPM or hook rate. This library collects 14 of those methods so you can match one to your budget and goal — and automate it without stitching together separate creative testing tools.

No. Every method is fully documented as a manual setup — run any of the 14 in Ads Manager by hand. The automation is just the optional shortcut: same method, launched from a template and watched for you.

There isn't one best Facebook creative testing framework — it depends on your goal, budget and platform. 3-3-3 is the common starting point for solo Meta buyers; post-Andromeda teams shipping 20+ creatives a week default to multi-variant battery testing on broad targeting. Use the filters above to match a method to your account.

For Meta ads creative testing in 2026, Andromeda changed the math. Under $5K/mo you can still run 5-10 across 2-3 ad sets; broad Advantage+ teams ship 10-30 per ad set, and above $50K/mo it's 50-100 a week as fatigue cycles compressed to 7-14 days.

Andromeda rewards creative volume and diversity over narrow audience targeting — broad ad sets carrying 10-30 distinct creatives now beat fragmented, hyper-targeted setups. Two practical shifts: test whole concepts, not 3 cuts of one ad (Andromeda's Entity ID dedup collapses near-identical variants into a single delivery slot), and ship fresh creatives weekly because fatigue cycles compressed to 7-14 days. The Multi-Variant Battery and Creative Refresh Cadence methods above are built for exactly this.

The frameworks carry over, but the signals don't. TikTok burns through creative faster, rewards native-feeling hooks in the first 1-2 seconds, and early on you lean on hook rate and IPM more than Meta-style CPA windows. Most methods in this library run on both platforms — each card tags whether it applies to Meta, TikTok, or both.

App-install UA judges creatives on IPM, CPI and downstream signals like d3 retention — not just front-end CPA — so cheap, high-volume testing matters even more than in e-commerce. That's why mobile UA teams run cost-optimized frameworks: Cheap Geo (validate in tier-3 / WW geos before spending tier-1 budget), Cheap Geo + AEO, and Mirror-BAU. All three are in the library above, filtered under Mobile UA.

3 ad sets × 3 creatives × 3 days — each ad set runs 3 distinct creatives for 72 hours, then you scale the winner and cut the rest. Popularized by Pilothouse; fast and cheap, but 3 days is tight signal on small budgets. Full breakdown in Method 01 above.

No. A/B Test compares two variants that are both running; Conversion Lift uses a true hold-out — some users see no ads — to measure real incrementality. Conversion Lift also isn't self-serve for most accounts; A/B Test is the one in your Experiments menu.

Depends on the method: 72h for 3-3-3, 5 days for the 3-2-2 sprint, 6-12 days for Bulk CBO, 7-14 days for Meta's A/B Test. The common mistake is running the method's cutoff at half the budget it needs — then your early signal is mostly noise.

Set creative testing to autopilot.

Test more. Find winners faster. Learning compounds. Meta + TikTok — free tier, no credit card.