Meta Ads · TikTok Ads · Creative testing frameworks

Stop guessing what creative testing framework to run.

The creative testing methods media buyers actually use across apps, e-commerce and mobile games. Automate the one you pick and make your ads perform better.

Each card: setup, pause criteria, budget floor & the common mistake
Free, no email gate — automate any method when you’re ready

Find your method Start for free

14: Methods
2: Platforms
20+: Auto-rules
Free: No gate

Winners are rare — everything rides on them.

Most creatives never win. A handful carry the whole account, then fatigue in 7–14 days.

Which method even works now?

Andromeda rewrote Meta overnight. TikTok plays by other rules. Nobody knows how to test creatives in 2026.

Every test eats a day by hand.

Set up, launch, babysit, score, log — by hand, for every test. Skip the last step and the lesson's gone.

The playbook

14 frameworks that ship winners
Find to yours.

Setup, pause thresholds, budget floor, common mistake — the full playbook per method.

14 production-grade methods.

3-3-3, Hooks Test, Multi-Variant Battery, Mirror-BAU, Cheap Geo, Conversion Lift — what real media buyers run.

↳ Free · no email gate

Filtered to your reality.

Chip filters by goal, budget, platform. No theoretical tests that won't fit your account.

↳ Match in 30 seconds

Every card is runnable today.

Setup, pause thresholds, budget floor, common mistake — full playbook per method.

↳ Manual or automated · your call

Explore the library

Your framework · our engine · zero ops

Pick a framework. Set it on autopilot. Faster winners. Hours back.

Set thresholds once. Launch · pause · promote · refresh · digest — runs automatically for every method.

Automatic, always.

Pause weak. Clone winners 2×. Refresh fatigue. Overnight.

↳ 20+ rules out of the box

First winners in 72 hours.

Worst-performer pause Day 3. Top-2 promoted Day 5. No waiting three weeks for signal.

↳ Same gates · every test

Iterate weekly, nothing drops.

Refresh cadence built in. Andromeda gets its 20–30 fresh variants. Nothing rots in BAU.

↳ Frequency · CTR · CPA · hook-rate · 24/7

Hours back from Ads Manager.

Stop logging in at midnight to pause an ad set. Spend the hours on new hooks, offers, angles.

↳ One Slack digest · three dashboards gone

↳ Two ways to start

Try it now Have us set it up

↳ One week on autopilotYou set thresholds once · Scalemate does the rest

Mon09:00
Bulk launch30 creatives from Drive · auto-named
Mon09:42
First test liveall ad sets running
Tue—
Auto-pausead sets below CTR threshold
WedDay 3
Worst-performer pauseCPA > 2× target
ThuDay 4
Top performers rankedby CPA per ad set
FriDay 5
Winner promotedauto-clones into BAU at 2× budget
Satongoing
Refresh monitorwatches frequency · CTR · CPA
Daily09:00
Slack digestpaused · promoted · spend

All 14 creative testing methods

Production-tested creative testing methods for Meta + TikTok — sorted by goal, budget and platform. Pick one, run it by hand, or put it on autopilot.

Find winners fast

MetaTikTok

The 3-3-3 Method

Your first winners in 72 hours. 3 ad sets × 3 creatives × 3 days — the fast, cheap way to find a starter winner. Works across subscription apps, e-commerce and gaming, at any account size.

Full playbook — setup, thresholds & common mistake

Best for

Meta & TikTok · subscription apps, e-commerce, gaming · $40K+/mo accounts.

Setup

3 distinct concepts — different hooks/angles, not 3 cuts of one ad.
3 ad sets, identical targeting + budget ($50–150/day each), 1 ad each.
Run 72h, no edits.
Day 4: pause ad sets at CPA > 1.5× target; scale the winner.

Strengths

Fast · cheap · forces 3 distinct concepts.

Trade-offs

Noisy under $100/day. 3 days = before learning phase exits. Tests creative only, not audience fit.

Common mistake

3 versions of one ad isn't 3-3-3 — variation's too narrow and Andromeda picks the wrong winner.

On autopilot with Scalemate

Template — save a 3-3-3 structure: 3 ad sets, 1 ad each, identical targeting + budget.
Bulk launch — swap in 3 Drive creatives, launch all 3 ad sets in 2 clicks.
Auto-pause — Day 4, cut any ad set at CPA > 1.5× target.
Auto-promote — winner clones into your scaling campaign at 2× budget.
Slack — daily status + winner called on Day 4.

Budget$40K+/mo

Duration3d

Creatives9

Setup10 min

Automate this method

3 concepts · 72h · CPA gate

Bulk launch3 ad sets

Concept A1 ad

Concept B1 ad

Concept C1 ad

Run 72 hoursno edits

CPA ≤ 1.5× target?evaluated day 4

✕ Pause setloser

★ Scale 2×winner

Slack the teamwinner ID + scale plan

Log to Google Sheetsassign test status · build the library

Andromeda-ready

MetaTikTok

Multi-Variant Battery (Andromeda Native)

Stop fighting the algorithm. Load 7–20 creatives — different formats and concepts — into one broad ad set and let Andromeda find the pockets.

Full playbook — setup, thresholds & common mistake

Best for

Shopping/Sales · broad audiences · post-Andromeda accounts.

Setup

1 broad campaign, demographics-only targeting.
Single ad set, single objective.
Load 7–20 creatives — different formats and concepts.
Phase 1 — at 48h, pause any creative with CPA > 1.5× target.
Phase 2 — at 72h, scale survivors with CPA ≤ target (2× + add to BAU); pause the rest.

Strengths

Algorithm-aligned. No manual audience-creative matching. Scales — more creatives = more pockets.

Trade-offs

Needs 7–20 creatives. Hard to explain wins to stakeholders. Hard to attribute to specific choices.

Common mistake

Loading 20 near-identical creatives. Andromeda needs real diversity (format + concept + angle), not 20 carousels.

On autopilot with Scalemate

Template — broad targeting, single ad set + objective.
Bulk launch — drag the full batch; all creatives load into one ad set.
Phase 1 auto-pause — cut any creative with CPA > 1.5× target at 48h.
Phase 2 auto-promote — scale 2× + clone to every BAU campaign for survivors with CPA ≤ target at 72h.
Slack alert — new winner posted the moment a creative passes Phase 2.

Budget$10–40K/mo

Duration3d

Creatives14

Setup15 min

Automate this method

Broad campaign · 7–20 creatives · 2-phase CPA gates

Broad campaign1 ad set · single objective

Load 7–20 creativesdifferent formats × concepts

Andromeda allocatesfinds the pockets

Phase 1 · 48hCPA check

✕ Pause creativeCPA > 1.5× target

Continue → P2CPA in range

Phase 2 · 72hwinner check

✕ PauseCPA > target

★ Scale 2× + BAUSlack: new winner ✓

Find winners fast

MetaTikTok

The 3-2-2 Method (5-Day Sprint)

A scaling-ready verdict in 5 days, not 14. Best when you're testing communication hypotheses in your creative — 3 different angles, 2 variations each.

Full playbook — setup, thresholds & common mistake

Best for

Testing creative communication hypotheses · 3 angles × 2 variations · a primary + backup winner in 5 days · $40K+/mo accounts.

Setup

3 ad sets × 2 creatives, 5-day run.
Identical audience + budget ($150–300/day each — higher budget compresses signal).
2 creatives compete inside each ad set.
Day 5: pick winner per ad set by CPA; move top + #2 to scaling.

Strengths

Faster than the 14-day version. In-ad-set variation suits Andromeda.

Trade-offs

Tighter signal = more noise. Needs a daily budget of at least 3× your target CPA per ad set.

Common mistake

Throwing random ads into each ad set. Each ad set should test one hypothesis — e.g. same angle, different CTAs — so you learn why the winner won, not just which ad won.

On autopilot with Scalemate

Template — 3 ad sets, 2 ads each, $150–300/day, identical targeting.
Bulk launch — load 6 Drive creatives (2 per ad set) in 1 click.
Auto-pause — Day 3, cut the worst ad sets at CPA > 1.5× target.
Day 5 — Scalemate surfaces winner + #2 per ad set automatically.
Auto-promote — top winner + backup clone to scaling at 2–3× budget.

Budget$40K+/mo

Duration5d

Creatives6

Setup15 min

Automate this method

5-day verdict · worst-performer pause · top 2

Bulk launch3 ad sets · 6 ads

Ad set A2 ads

Ad set B2 ads

Ad set C2 ads

Day 3 — worst-performer pausepause if CPA > 2× target

✕ Worst pausedremoved day 3

Survivors run onto day 5

Day 5 — rank top 2best CPA per ad set

★ Scale best adown ad set · 2–3×

Keep 2nd bestfallback if leader tires

Add top 2 → BAUinto all active campaigns

Find winners fast

MetaTikTok

Hooks Test

Hook rate decides what spend a creative gets early — the body decides what converts to purchase. Hold a body that already converts, swap hooks, scale the one that unlocks volume.

Full playbook — setup, thresholds & common mistake

Best for

Video-heavy teams (mobile UA, DTC) · refining a proven concept · $5K+/mo accounts.

Setup

Take one winning video.
Make 3–5 variants — different first 3 seconds only.
Keep body + end identical.
Run in an Advantage+ CBO campaign — 1 ad set per hook, 1 ad each.
After 1K+ impressions/variant, cut bottom 50% by hook-rate.

Strengths

Isolates the hook cleanly. Compounds a proven concept. Fast once you have a base winner.

Trade-offs

Needs editing capacity. Refines, doesn't discover. Hook-rate ≠ ROAS.

Common mistake

Comparing hook rates across different campaigns instead of within one test — hook-rate is context-dependent.

On autopilot with Scalemate

Template — Advantage+ CBO campaign, 1 ad set per hook, 1 ad each, primary metric = hook rate.
Bulk launch — drag 5 Drive variants into the ad set, launch.
Auto-pause — cut any variant under control × 1.2 hook-rate after 2K impressions.
Winner — reporting tags the top hook for the next batch.
Next cycle — Slack alert when one variant clears 1.5× control to prompt iteration.

Budget$5K+/mo

Duration7d

Creatives5

Setup20 min

Automate this method

1 body · 5 hook variants

Winning conceptkeep body, swap hook

5 hook variants · same body

Hook A

Hook B

Hook C

Hook D

Hook E

Advantage+ campaign · CBO1 ad set · 1 ad per hook

Auto-pause rulehook-rate > 1.2× control · CPA ≤ target

✕ Variant pausedweak hook cut

★ Scale winner≥1.5× control by hook-rate

Validate before scaling

Meta Conversion Lift Test

Settle 'did the ad actually cause that sale?' with finance. A true hold-out measures incremental conversions — but it's gated to enterprise teams with a Meta rep and $30K+ for the study.

Meta-gated · measurement-only — the Lift study runs through Meta, not Scalemate.

Full playbook — setup, thresholds & common mistake

Best for

Enterprise w/ Meta rep · $75K+/mo accounts · validating a big change before scaling 2–3×.

Setup

Request a Lift study via your Meta rep.
Meta splits users into test (sees ads) vs hold-out (doesn't).
Run 4+ weeks for a valid sample.
Define the conversion event.
Meta reports incremental conversions + cost per incremental conversion.

Strengths

Highest-quality incrementality Meta offers. Separates 'ad drove this' from 'would've converted anyway.' Ends attribution debates.

Trade-offs

Gated — needs a Meta rep. 4+ week cycle. $30K+ for the study. One decision at a time, not a daily tool.

Common mistake

Assuming it's the same as the A/B Test option. A/B compares two live variants; Lift uses a true hold-out. A/B can't measure incrementality.

On autopilot with Scalemate

Pre-Lift — Scalemate locks current variants (no rotation during the 4-week window).
During — dashboard reports CPA + ROAS in parallel so finance sees both.
Post-Lift — flip 'scaling mode' once incrementality is confirmed.
Auto-scale — +20%/day when ROAS > target × 1.1, auto-revert on a 15% drop.
Slack — daily scaling-action summary for team + finance.

Budget$75K+/mo

Duration28d

Creatives∞

SetupBy Request

Set it up in Scalemate

Hold-out study · 4 weeks · incrementality

Meta rep requestLift study via your rep

Test groupsees ads

Hold-outno ads — control

Run 4+ weeksfor valid sample

Define eventincremental conversions

✕ No liftdo not scale

★ Lift confirmedscale 2–3×

Validate before scaling

MetaTikTok

Bulk Creative Test in CBO (2-Phase Funnel Progression)

Test 12–75 creatives a cycle without funding duds. Phase 1 cuts the bottom 50–70% on cheap signal; Phase 2 validates survivors on revenue. For R&D + AI-creative pipelines.

Full playbook — setup, thresholds & common mistake

Best for

High-volume R&D teams · 30+ creatives/cycle · AI-generation pipelines · $10K+/mo accounts.

Setup

1 CBO campaign, 4–5 ad sets (broad or diversified lookalikes).
3–15 ads per ad set (12–75 total).
Phase 1 (3–5d): let CBO allocate; cut bottom 50–70% on upper-funnel signal (installs / registrations / add-to-cart).
Phase 2 (3–7d): keep top 20–40%; judge on lower-funnel (purchases / trials / paying users).
Move final 5–10 winners to scaling.

Strengths

Tests volume efficiently (12–75 vs 9). 2-phase saves budget. Built for AI creative pipelines. Matches how CBO allocates.

Trade-offs

Needs 12–75 creatives. Phase 1 metric choice is critical — wrong signal cuts real winners. 6–12 day cycle.

Common mistake

Treating Phase 1 metrics as final. High install rate ≠ high purchase rate. Phase 2 exists to weed those out — don't skip it.

On autopilot with Scalemate

Template — CBO campaign, 4–5 ad sets, single objective.
Bulk launch — drag 12–75 Drive creatives; Scalemate spreads 3–15 per ad set.
Phase 1 auto-pause (Day 3) — cut bottom 50–70% on upper-funnel events.
Phase 2 auto-pause (Day 5–8) — cut survivors failing lower-funnel CPA.
Auto-promote — final winners clone to scaling at 2–3×; batch tagged 'winner library.'

Budget$10K+/mo

Duration12d

Creatives44

Setup30 min

Automate this method

CBO · 12–75 ads · 2-phase funnel

CBO campaignauto-budget · 4–5 ad sets

Adset 13–15 ads

Adset 23–15 ads

Adset 33–15 ads

Adset 43–15 ads

Phase 1 · day 3upper-funnel signal

✕ Cut 50–70%weak upper-funnel

Top 30–40%advance to P2

Phase 2 · day 5–8revenue signal

✕ Cut on revenueno purchases

★ Scale 2–3×5–10 final winners

Cut losers fast

MetaTikTok

Static vs Video Test

Stop guessing whether your offer wants video or static. 3 ad sets — static / video / mixed, same concept — tells you where to point production budget.

Full playbook — setup, thresholds & common mistake

Best for

New accounts · format transitions · vertical-specific calls (mobile UA, eCom Reels).

Setup

3 ad sets: static-only, video-only, mixed.
Same concept across all 3.
Same audience + budget ($100–200/day each).
Run 7–14 days.
Compare CPA, CTR, hook rate, watch-through by format.

Strengths

Ends the 'video always wins' assumption. Data-backed format strategy.

Trade-offs

Needs both formats produced. Result varies by funnel stage (TOFU = video, BOFU = static).

Common mistake

Calling video the winner on engagement when static had better CPA. Optimize for the business metric, not engagement.

On autopilot with Scalemate

Template — 3 ad sets (static / video / mixed), identical targeting + budget.
Bulk launch — drag 6 static + 6 video + 6 mixed; 6 per ad set in 1 click.
Auto-pause — cut anything under 0.5% CTR after 500 impressions.
Format reporting — dashboard shows aggregate CPA per format, not just per ad.
Auto-promote — winning format's top 2 clone to scaling; lock the format default.

Budget$10–40K/mo

Duration5d

Creatives18

Setup20 min

Automate this method

Same concept · 3 formats · CTR + CPA gate

Launch — same concept3 formats · equal budget

Static only6 ads

Video only6 ads

Mixed6 ads

✕ Auto-pause adsCTR < 0.5% · 500 impr

Format reportingaggregate CPA per format

★ Scale top 2clone winning format

Lock formatas default

Andromeda-ready

MetaTikTok

Creative Refresh Cadence

Catch fatigue before it eats your CPA. Watch frequency, CTR, CPA and hook-rate on every live creative and swap the moment one breaks. Mandatory post-Andromeda — cycles dropped to 7–14 days.

Full playbook — setup, thresholds & common mistake

Best for

Any team running winners 30+ days · critical post-Andromeda.

Setup

Track per-creative weekly: frequency, CPA, CTR, hook rate.
Replace when frequency > 3.0, CTR −20% from peak, CPA +30%, or hook-rate −15% (video).
Swap in a variation of the winner, not a brand-new concept.
Pause the fatigued creative immediately.

Strengths

Stops budget waste on dying creative. Holds campaign CPA. Forces a refresh rhythm.

Trade-offs

Needs a steady creative pipeline. Maintenance, not discovery.

Common mistake

Replacing with a new concept instead of a variation — fresh concepts need their own test; a variation inherits 70–80% of the winner.

On autopilot with Scalemate

Always-on monitor — watches every creative for fatigue triggers.
Slack — instant DM with creative ID + which metric broke.
Auto-pause — cuts the fatiguing creative the same hour.
Auto-swap — pulls the next variant from a tagged Drive queue. Zero campaign holes.
Weekly — which creatives fatigued, replacement performance, cycle time.

Budget<$10K/mo

Duration14d

Creatives—

SetupOnce

Automate this method

Always-on monitor · 4 fatigue triggers

Always-on monitorevery live winner · 24/7

Freq > 3.0

CTR −20%

CPA +30%

Hook −15%

any one fires

Slack DMcreative ID + metric

✕ Auto-pausecuts fatiguing creative

★ Auto-swap variationnext from tagged queue

Weekly reportfatigued · repl · cycle time

Controlled comparison

MetaTikTok

Control Ad Test (Equal Impressions vs Winner)

The cleanest yes/no on whether a challenger should replace your champion. Equal impressions, then stop — no time or audience bias. Compare by IPM (app) or CVR (web).

Full playbook — setup, thresholds & common mistake

Best for

Teams with an established winner · iterating a replacement · app teams (IPM), web teams (CVR).

Setup

Use your proven winner as control.
2 ad sets (identical audience) OR 1 ad set with 2 ads, standard delivery (not Dynamic Creative).
Run until both hit the same impressions (5K / 10K / 20K).
Pause both at target — don't let the leader run on.
Wait 24–48h for attribution, compare by IPM / CVR / lead CPA.

Strengths

Cleanest head-to-head. Controls time-of-day, day-of-week, audience drift. Equal sample sizes.

Trade-offs

Needs manual impression monitoring. Attribution lag delays the call. Not parallelizable (5 challengers = 5 runs).

Common mistake

Stopping one at target but letting the other run 'a bit longer' — re-introduces time bias. If one lags, raise its budget to catch up.

On autopilot with Scalemate

Template — 2 ad sets, or 1 ad set with 2 ads (no Dynamic Creative).
Equal-impressions auto-pause — both pause within minutes of the target, no manual watching.
Attribution lag — reporting holds 48h post-pause so late conversions count.
Decision metric — IPM (app) / CVR (web) / lead CPA per ad.
Winner — at ≥1.1× control, clone to BAU + auto-pause the old control.

Budget$10–40K/mo

Duration4d

Creatives2

Setup15 min

Automate this method

Champion vs Challenger · equal impressions

Championproven control

Challengernew variation

Equal impressionsboth stop at 5K / 20K

Hold 48hattribution lag

Challenger ≥ 1.1× control?on decision metric

Champion stayschallenger dropped

★ Challenger winsclone to BAU · retire control

Controlled comparison

Meta Native A/B Test

Let Meta do the stats. The native Experiments tool splits the audience, runs the test and declares a winner with confidence. For teams without a data analyst.

Full playbook — setup, thresholds & common mistake

Best for

Teams wanting native attribution + Meta-handled splitting + a clear verdict, no manual setup.

Setup

Ads Manager → Experiments → A/B Test.
Pick 2–4 ads / ad sets / campaigns.
Set duration (7–14d min) or a spend cap per variant.
Meta splits the audience — no overlap.
Judge by your chosen metric: hook rate, CTR, CVR, or CPA.

Strengths

Meta handles the math. Auto-splits to remove overlap bias. Clear 'Variant X won, Y% confidence.'

Trade-offs

7-day min even for small tests. Only 2–4 variants. Meta optimizes for confidence, not always your metric.

Common mistake

Setting the wrong primary metric — pick CTR and Meta crowns the high-CTR variant even if its CPA is worse. Set primary = the metric you'll act on.

On autopilot with Scalemate

Parallel monitor — Scalemate tracks CPA + ROAS + hook rate per variant alongside Meta.
Slack on verdict — Meta's call + Scalemate's full breakdown, side by side.
Disagreement check — flags when Meta's winner conflicts with your business KPI.
Auto-promote — if both agree, clone the winner to scaling.
Loser archive — losing variants tagged in Drive for future learning.

Budget$10–40K/mo

Duration10d

Creatives4

Setup10 min

Automate this method

Meta Experiments · native split · 7–14d

Ads Manager → ExperimentsA/B Test

Variant A

Variant B

Variant C

Variant D

2–4 ads / ad sets

Meta splits audienceno overlap · 7–14 days

Scalemate parallelCPA · ROAS · hook rate

✕ DisagreementMeta winner ≠ your KPI

★ Both agreeauto-promote winner

Mobile UA (cost-optimized)

MetaTikTok

Cheap Geo / WW Testing

Test 4–10× more creatives per dollar. Run early tests in T3 geos or WW MAI where CPI is a fraction of T1, then promote winners home. Mobile UA only — and only after correlation is proven.

Full playbook — setup, thresholds & common mistake

Best for

Mobile UA · $1K+/day per geo · high creative volume · proven cheap-geo→T1 correlation.

Setup

MAI campaign in T3 geos (Indonesia, Philippines, Brazil, Vietnam) or a WW MAI biased to cheap inventory.
Run creatives at 4–10× lower CPI than T1.
Judge by IPM + CPI.
Promote winners to a T1 AEO/ROAS campaign.
Small accounts (<$1–2K/day): move winners straight to BAU.

Strengths

4–10× cheaper testing. More creatives per dollar. Fast install volume = fast signal.

Trade-offs

Only works when cheap-geo winners correlate with T1. Strong for utilities/casual games, weak for premium/payment-heavy apps.

Common mistake

Promoting T3 winners to T1 without validating correlation. Some win cheap on low-intent T3 audiences and flop in T1. Validate the top 5 first.

On autopilot with Scalemate

Templates — T3/WW MAI test + T1 AEO/ROAS scaling, reusable.
Bulk launch in T3 — drag the batch; cheap impressions start flowing.
Auto-pause (Day 2) — cut creatives under IPM threshold; T3 signal is fast.
Correlation gate — one-time manual check: top 5 T3 winners vs T1 baseline.
Auto-promote — validated winners clone T3 → T1 with budget scaling.

Budget$10–40K/mo

Duration6d

Creatives40

Setup30 min

Automate this method

T3/WW MAI · IPM gate · validate to T1

T3 / WW MAI campaignIndonesia · Philippines · Brazil

Day 2 auto-pausecut below IPM threshold

Correlation gatemanual: top 5 T3 vs T1 baseline

T3 ↔ T1 correlation OK?before mass promotion

✕ Reject T3 wincheap-geo only

★ Promote → T1AEO/ROAS scaling

Mobile UA (cost-optimized)

MetaTikTok

Cheap Geo + AEO (Combined)

Cheap-geo prices, AEO-quality signal. Filter creatives on the event that actually matters (d3 retention, first purchase, level-5) without paying T1 CPIs. Needs 50+ AEO events/week.

Full playbook — setup, thresholds & common mistake

Best for

Mobile UA w/ proven cheap-geo AEO correlation · enough conversion volume for AEO learning.

Setup

AEO campaign in T3 cheap geos.
Pick an AEO event tied to monetization (d3 retention, 1st purchase, level-5, 1st top-up).
Test creatives; score on AEO event rate, not just IPM.
Promote winners to a T1 AEO campaign, same event.

Strengths

Cost savings + signal quality. AEO outcomes mean more than raw installs.

Trade-offs

AEO needs volume — under ~50 events/week the learning phase never closes. Validate event throughput first.

Common mistake

Running AEO on thin volume. ~50 events/week minimum; at 10/week you get random delivery, not optimization.

On autopilot with Scalemate

Template — cheap-geo AEO campaign tied to your monetization event.
Bulk launch — load the batch into the AEO campaign in cheap geos.
Event-volume safety — auto-pause the campaign if AEO events < 50/week (switch to MAI Method 11).
Auto-pause — cut creatives under event-rate threshold after 5–7 days.
Auto-promote — validated winners clone to T1 AEO, same objective.

Budget$40K+/mo

Duration7d

Creatives25

Setup45 min

Automate this method

T3 AEO · event-volume safety · promote

T3 AEO campaigntied to monetization event

AEO events ≥ 50/wk?learning-phase safety

✕ Auto-pause campaignswitch to MAI (method 11)

Volume OKcontinue testing

✕ Day 5–7 cutbelow event-rate threshold

★ Promote → T1 AEOsame monetization event

Mobile UA (cost-optimized)

MetaTikTok

Mirror-BAU Testing

Test creatives in the exact conditions they'll have to survive. Clone your BAU campaign, swap in 2–4 new creatives, run 5–7 days in parallel. Priciest per test, most reliable signal — no T3→T1 gap.

Full playbook — setup, thresholds & common mistake

Best for

Mature mobile UA where cheap-geo correlation failed · creatives that must work in production.

Setup

Duplicate your BAU campaign exactly — optimization, audience, placements, geos.
Swap in the new creatives only.
Run 5–7 days alongside BAU.
Compare by BAU metric (IPM, CPI, AEO event rate, ROAS).
Winners replace fatiguing BAU creatives.

Strengths

Most reliable signal of any UA method. Winners behave the same when scaled. No correlation gap.

Trade-offs

Priciest per creative — full T1/BAU CPI. Not for high-volume iteration (cost scales linearly).

Common mistake

Testing too many at once dilutes BAU and spreads spend thin. Limit to 2–4 per mirror cycle.

On autopilot with Scalemate

One-click BAU clone — exact optimization, audience, placements, geos. No config drift.
Bulk swap — drag 2–4 new Drive creatives, replace only the creatives.
Auto-pause (Day 3) — cut new creatives under BAU baseline × 1.3 CPI.
Day 5–7 — per-creative CPA + AEO rate vs BAU baseline.
Auto-replace — winners clone into BAU; fatigued BAU creative auto-paused.

Budget$40K+/mo

Duration7d

Creatives4

Setup20 min

Automate this method

Clone BAU · swap creatives · run parallel

Clone BAU campaignsame audience · same placements

Swap 2–4 creativesfresh challengers only

Run 5–7 days parallelvs. live BAU

Beats BAU CPA?side-by-side compare

✕ Discard challengerBAU continues

★ Promote to BAUswap into live

Find winners fast

MetaTikTok

CBO Spend-Gated Test (1 Ad/Set)

Isolate every variant in its own ad set. CBO allocates spend; pause gates fire at $60 (CPI check) and $150 (CPA check). Survivors scale 1.5× and push as new ads into every BAU campaign.

Full playbook — setup, thresholds & common mistake

Best for

Teams wanting isolated $-gated reads · clean per-variant signal · no in-adset competition.

Setup

1 CBO campaign · 1 ad set per variant · 1 ad each.
Phase 1 — at $60 ad-set spend, pause if CPI > 1.5× target.
Phase 2 — at $150 ad-set spend, pause if CPA > 1.5× target.
Survivors: scale 1.5× budget + clone as new ad into every active BAU campaign.

Strengths

Clean per-variant read — no ads competing inside an ad set. Spend-based gates adapt to account pace. Survivors land in BAU as proven, not theoretical.

Trade-offs

More ad sets = more learning phases to feed. CBO may underspend low-CTR variants before they hit Phase 1 gate. Fights Andromeda's 'feed many in one' instinct.

Common mistake

Setting CPI/CPA targets too tight. Use the BAU running average × 1.5, not an aspirational target — otherwise you cut variants that just needed a wider audience pocket.

On autopilot with Scalemate

Template — CBO campaign, 1 ad set per variant, 1 ad each.
Bulk launch — drag N creatives from Drive; Scalemate auto-creates N ad sets + 1 ad per set.
Phase 1 auto-pause — cut when adset spend ≥ $60 AND CPI > 1.5× target.
Phase 2 auto-pause — cut when adset spend ≥ $150 AND CPA > 1.5× target.
Auto-promote — survivor clones at 1.5× budget + adds as new ad to every active BAU campaign.

Budget$10K+/mo

Duration5-7d

Creatives8

Setup15 min

Automate this method

1 ad/set · $60 + $150 spend gates · BAU sync

CBO campaign1 ad set per variant · 1 ad each

Adset 11 ad

Adset 21 ad

Adset 31 ad

Adset 41 ad

Phase 1 · $60 spendCPI check

✕ Pause adsetCPI > 1.5× target

Continue → P2CPI in range

Phase 2 · $150 spendCPA check

✕ Pause adsetCPA > 1.5× target

★ Scale 1.5× + BAUclone as new ad to all BAU

How Scalemate’s automated creative testing platform runs any method in the library

One engine behind every method — you pick the template, the platform runs the test. Most creative testing tools and ad creative testing platforms stop at a dashboard; this one launches the batch, cuts losers, scales winners, then loops results back so each cycle runs faster and bigger.

1
Pick a method template
Preset structure — the exact ad set count, budget split, and audience for each method.
2
Bulk launch from Google Drive
Drag 30 creatives at once — auto-named and pushed into the right ad sets.
3
Auto-pause on the method's schedule
Losers paused on threshold (CPA, ROAS, IPM) without you logging in.
4
Winners auto-clone to scaling
Hit the promotion threshold and the creative duplicates into your scale campaign automatically.
5
Slack summary every morning
Paused, promoted, burning budget — every test's state, without opening Ads Manager.
6
Sync results back to your own stack
Connect Scalemate to your system and push every test result — winners, losers, hook rates — straight into your database or creative pipeline via API. Your generation tool spins up the next batch from what actually won, and the loop runs itself.

Run your first test See the 20+ automation rules →

Related resources

Frequently asked questions

What is creative testing?

Creative testing — also called ad creative testing or creative ad testing — is the process of running multiple ad creatives against each other to find which ones drive the cheapest, highest-quality conversions before you put real budget behind them. On Meta and TikTok it means structured tests like 3-3-3, Hooks Tests, creative A/B testing, dynamic creative testing and multi-variant batteries, judged on CPA, ROAS, IPM or hook rate. This library collects 14 of those methods so you can match one to your budget and goal — and automate it without stitching together separate creative testing tools.

Do I have to use Scalemate to run these methods?

No. Every method is fully documented as a manual setup — run any of the 14 in Ads Manager by hand. The automation is just the optional shortcut: same method, launched from a template and watched for you.

What is the best creative testing framework for Facebook ads?

There isn't one best Facebook creative testing framework — it depends on your goal, budget and platform. 3-3-3 is the common starting point for solo Meta buyers; post-Andromeda teams shipping 20+ creatives a week default to multi-variant battery testing on broad targeting. Use the filters above to match a method to your account.

How many creatives should I test per week in 2026?

For Meta ads creative testing in 2026, Andromeda changed the math. Under $5K/mo you can still run 5-10 across 2-3 ad sets; broad Advantage+ teams ship 10-30 per ad set, and above $50K/mo it's 50-100 a week as fatigue cycles compressed to 7-14 days.

How did Meta's Andromeda update change creative testing in 2026?

Andromeda rewards creative volume and diversity over narrow audience targeting — broad ad sets carrying 10-30 distinct creatives now beat fragmented, hyper-targeted setups. Two practical shifts: test whole concepts, not 3 cuts of one ad (Andromeda's Entity ID dedup collapses near-identical variants into a single delivery slot), and ship fresh creatives weekly because fatigue cycles compressed to 7-14 days. The Multi-Variant Battery and Creative Refresh Cadence methods above are built for exactly this.

Is creative testing on TikTok different from creative testing on Meta?

The frameworks carry over, but the signals don't. TikTok burns through creative faster, rewards native-feeling hooks in the first 1-2 seconds, and early on you lean on hook rate and IPM more than Meta-style CPA windows. Most methods in this library run on both platforms — each card tags whether it applies to Meta, TikTok, or both.

How is creative testing for mobile UA and app campaigns different?

App-install UA judges creatives on IPM, CPI and downstream signals like d3 retention — not just front-end CPA — so cheap, high-volume testing matters even more than in e-commerce. That's why mobile UA teams run cost-optimized frameworks: Cheap Geo (validate in tier-3 / WW geos before spending tier-1 budget), Cheap Geo + AEO, and Mirror-BAU. All three are in the library above, filtered under Mobile UA.

What's the 3-3-3 method in Facebook ads?

3 ad sets × 3 creatives × 3 days — each ad set runs 3 distinct creatives for 72 hours, then you scale the winner and cut the rest. Popularized by Pilothouse; fast and cheap, but 3 days is tight signal on small budgets. Full breakdown in Method 01 above.

Is Meta's Conversion Lift test the same as A/B testing?

No. A/B Test compares two variants that are both running; Conversion Lift uses a true hold-out — some users see no ads — to measure real incrementality. Conversion Lift also isn't self-serve for most accounts; A/B Test is the one in your Experiments menu.

How long should I run a creative test before deciding?

Depends on the method: 72h for 3-3-3, 5 days for the 3-2-2 sprint, 6-12 days for Bulk CBO, 7-14 days for Meta's A/B Test. The common mistake is running the method's cutoff at half the budget it needs — then your early signal is mostly noise.

Set creative testing to autopilot.

Test more. Find winners faster. Learning compounds. Meta + TikTok — free tier, no credit card.

Try Scalemate free Book a Demo

Stop guessing what creative testing framework to run.

Creative testing is brutal on every front.

Winners are rare — everything rides on them.

Which method even works now?

Every test eats a day by hand.

14 frameworks that ship winners Find to yours.

14 production-grade methods.

Filtered to your reality.

Every card is runnable today.

Pick a framework. Set it on autopilot. Faster winners. Hours back.

Automatic, always.

First winners in 72 hours.

Iterate weekly, nothing drops.

Hours back from Ads Manager.

All 14 creative testing methods

The 3-3-3 Method

Multi-Variant Battery (Andromeda Native)

The 3-2-2 Method (5-Day Sprint)

Hooks Test

Meta Conversion Lift Test

Bulk Creative Test in CBO (2-Phase Funnel Progression)

Static vs Video Test

Creative Refresh Cadence

Control Ad Test (Equal Impressions vs Winner)

Meta Native A/B Test

Cheap Geo / WW Testing

Cheap Geo + AEO (Combined)

Mirror-BAU Testing

CBO Spend-Gated Test (1 Ad/Set)

How Scalemate’s automated creative testing platform runs any method in the library

Pick a method template

Bulk launch from Google Drive

Auto-pause on the method's schedule

Winners auto-clone to scaling

Slack summary every morning

Sync results back to your own stack

Related resources

Frequently asked questions

Set creative testing to autopilot.

14 frameworks that ship winners
Find to yours.