14 production-grade methods.
3-3-3, Hooks Test, Multi-Variant Battery, Mirror-BAU, Cheap Geo, Conversion Lift — what real media buyers run.
↳ Free · no email gateThe creative testing methods media buyers actually use across apps, e-commerce and mobile games. Automate the one you pick and make your ads perform better.
“Added new ads to a profitable account. Dropped it overnight — 1.38 → 0.75 ROAS, $18k and 30 days to recover.”
Most creatives never win. A handful carry the whole account, then fatigue in 7–14 days.
Andromeda rewrote Meta overnight. TikTok plays by other rules. Nobody knows how to test creatives in 2026.
Set up, launch, babysit, score, log — by hand, for every test. Skip the last step and the lesson's gone.
Setup, pause thresholds, budget floor, common mistake — the full playbook per method.
3-3-3, Hooks Test, Multi-Variant Battery, Mirror-BAU, Cheap Geo, Conversion Lift — what real media buyers run.
↳ Free · no email gateChip filters by goal, budget, platform. No theoretical tests that won't fit your account.
↳ Match in 30 secondsSetup, pause thresholds, budget floor, common mistake — full playbook per method.
↳ Manual or automated · your callSet thresholds once. Launch · pause · promote · refresh · digest — runs automatically for every method.
Pause weak. Clone winners 2×. Refresh fatigue. Overnight.
↳ 20+ rules out of the boxWorst-performer pause Day 3. Top-2 promoted Day 5. No waiting three weeks for signal.
↳ Same gates · every testRefresh cadence built in. Andromeda gets its 20–30 fresh variants. Nothing rots in BAU.
↳ Frequency · CTR · CPA · hook-rate · 24/7Stop logging in at midnight to pause an ad set. Spend the hours on new hooks, offers, angles.
↳ One Slack digest · three dashboards goneProduction-tested creative testing methods for Meta + TikTok — sorted by goal, budget and platform. Pick one, run it by hand, or put it on autopilot.
Your first winners in 72 hours. 3 ad sets × 3 creatives × 3 days — the fast, cheap way to find a starter winner. Works across subscription apps, e-commerce and gaming, at any account size.
Meta & TikTok · subscription apps, e-commerce, gaming · $40K+/mo accounts.
Fast · cheap · forces 3 distinct concepts.
Noisy under $100/day. 3 days = before learning phase exits. Tests creative only, not audience fit.
3 versions of one ad isn't 3-3-3 — variation's too narrow and Andromeda picks the wrong winner.
Stop fighting the algorithm. Load 7–20 creatives — different formats and concepts — into one broad ad set and let Andromeda find the pockets.
Shopping/Sales · broad audiences · post-Andromeda accounts.
Algorithm-aligned. No manual audience-creative matching. Scales — more creatives = more pockets.
Needs 7–20 creatives. Hard to explain wins to stakeholders. Hard to attribute to specific choices.
Loading 20 near-identical creatives. Andromeda needs real diversity (format + concept + angle), not 20 carousels.
A scaling-ready verdict in 5 days, not 14. Best when you're testing communication hypotheses in your creative — 3 different angles, 2 variations each.
Testing creative communication hypotheses · 3 angles × 2 variations · a primary + backup winner in 5 days · $40K+/mo accounts.
Faster than the 14-day version. In-ad-set variation suits Andromeda.
Tighter signal = more noise. Needs a daily budget of at least 3× your target CPA per ad set.
Throwing random ads into each ad set. Each ad set should test one hypothesis — e.g. same angle, different CTAs — so you learn why the winner won, not just which ad won.
Hook rate decides what spend a creative gets early — the body decides what converts to purchase. Hold a body that already converts, swap hooks, scale the one that unlocks volume.
Video-heavy teams (mobile UA, DTC) · refining a proven concept · $5K+/mo accounts.
Isolates the hook cleanly. Compounds a proven concept. Fast once you have a base winner.
Needs editing capacity. Refines, doesn't discover. Hook-rate ≠ ROAS.
Comparing hook rates across different campaigns instead of within one test — hook-rate is context-dependent.
Settle 'did the ad actually cause that sale?' with finance. A true hold-out measures incremental conversions — but it's gated to enterprise teams with a Meta rep and $30K+ for the study.
Meta-gated · measurement-only — the Lift study runs through Meta, not Scalemate.
Enterprise w/ Meta rep · $75K+/mo accounts · validating a big change before scaling 2–3×.
Highest-quality incrementality Meta offers. Separates 'ad drove this' from 'would've converted anyway.' Ends attribution debates.
Gated — needs a Meta rep. 4+ week cycle. $30K+ for the study. One decision at a time, not a daily tool.
Assuming it's the same as the A/B Test option. A/B compares two live variants; Lift uses a true hold-out. A/B can't measure incrementality.
Test 12–75 creatives a cycle without funding duds. Phase 1 cuts the bottom 50–70% on cheap signal; Phase 2 validates survivors on revenue. For R&D + AI-creative pipelines.
High-volume R&D teams · 30+ creatives/cycle · AI-generation pipelines · $10K+/mo accounts.
Tests volume efficiently (12–75 vs 9). 2-phase saves budget. Built for AI creative pipelines. Matches how CBO allocates.
Needs 12–75 creatives. Phase 1 metric choice is critical — wrong signal cuts real winners. 6–12 day cycle.
Treating Phase 1 metrics as final. High install rate ≠ high purchase rate. Phase 2 exists to weed those out — don't skip it.
Stop guessing whether your offer wants video or static. 3 ad sets — static / video / mixed, same concept — tells you where to point production budget.
New accounts · format transitions · vertical-specific calls (mobile UA, eCom Reels).
Ends the 'video always wins' assumption. Data-backed format strategy.
Needs both formats produced. Result varies by funnel stage (TOFU = video, BOFU = static).
Calling video the winner on engagement when static had better CPA. Optimize for the business metric, not engagement.
Catch fatigue before it eats your CPA. Watch frequency, CTR, CPA and hook-rate on every live creative and swap the moment one breaks. Mandatory post-Andromeda — cycles dropped to 7–14 days.
Any team running winners 30+ days · critical post-Andromeda.
Stops budget waste on dying creative. Holds campaign CPA. Forces a refresh rhythm.
Needs a steady creative pipeline. Maintenance, not discovery.
Replacing with a new concept instead of a variation — fresh concepts need their own test; a variation inherits 70–80% of the winner.
The cleanest yes/no on whether a challenger should replace your champion. Equal impressions, then stop — no time or audience bias. Compare by IPM (app) or CVR (web).
Teams with an established winner · iterating a replacement · app teams (IPM), web teams (CVR).
Cleanest head-to-head. Controls time-of-day, day-of-week, audience drift. Equal sample sizes.
Needs manual impression monitoring. Attribution lag delays the call. Not parallelizable (5 challengers = 5 runs).
Stopping one at target but letting the other run 'a bit longer' — re-introduces time bias. If one lags, raise its budget to catch up.
Let Meta do the stats. The native Experiments tool splits the audience, runs the test and declares a winner with confidence. For teams without a data analyst.
Teams wanting native attribution + Meta-handled splitting + a clear verdict, no manual setup.
Meta handles the math. Auto-splits to remove overlap bias. Clear 'Variant X won, Y% confidence.'
7-day min even for small tests. Only 2–4 variants. Meta optimizes for confidence, not always your metric.
Setting the wrong primary metric — pick CTR and Meta crowns the high-CTR variant even if its CPA is worse. Set primary = the metric you'll act on.
Test 4–10× more creatives per dollar. Run early tests in T3 geos or WW MAI where CPI is a fraction of T1, then promote winners home. Mobile UA only — and only after correlation is proven.
Mobile UA · $1K+/day per geo · high creative volume · proven cheap-geo→T1 correlation.
4–10× cheaper testing. More creatives per dollar. Fast install volume = fast signal.
Only works when cheap-geo winners correlate with T1. Strong for utilities/casual games, weak for premium/payment-heavy apps.
Promoting T3 winners to T1 without validating correlation. Some win cheap on low-intent T3 audiences and flop in T1. Validate the top 5 first.
Cheap-geo prices, AEO-quality signal. Filter creatives on the event that actually matters (d3 retention, first purchase, level-5) without paying T1 CPIs. Needs 50+ AEO events/week.
Mobile UA w/ proven cheap-geo AEO correlation · enough conversion volume for AEO learning.
Cost savings + signal quality. AEO outcomes mean more than raw installs.
AEO needs volume — under ~50 events/week the learning phase never closes. Validate event throughput first.
Running AEO on thin volume. ~50 events/week minimum; at 10/week you get random delivery, not optimization.
Test creatives in the exact conditions they'll have to survive. Clone your BAU campaign, swap in 2–4 new creatives, run 5–7 days in parallel. Priciest per test, most reliable signal — no T3→T1 gap.
Mature mobile UA where cheap-geo correlation failed · creatives that must work in production.
Most reliable signal of any UA method. Winners behave the same when scaled. No correlation gap.
Priciest per creative — full T1/BAU CPI. Not for high-volume iteration (cost scales linearly).
Testing too many at once dilutes BAU and spreads spend thin. Limit to 2–4 per mirror cycle.
Isolate every variant in its own ad set. CBO allocates spend; pause gates fire at $60 (CPI check) and $150 (CPA check). Survivors scale 1.5× and push as new ads into every BAU campaign.
Teams wanting isolated $-gated reads · clean per-variant signal · no in-adset competition.
Clean per-variant read — no ads competing inside an ad set. Spend-based gates adapt to account pace. Survivors land in BAU as proven, not theoretical.
More ad sets = more learning phases to feed. CBO may underspend low-CTR variants before they hit Phase 1 gate. Fights Andromeda's 'feed many in one' instinct.
Setting CPI/CPA targets too tight. Use the BAU running average × 1.5, not an aspirational target — otherwise you cut variants that just needed a wider audience pocket.
One engine behind every method — you pick the template, the platform runs the test. Most creative testing tools and ad creative testing platforms stop at a dashboard; this one launches the batch, cuts losers, scales winners, then loops results back so each cycle runs faster and bigger.
Preset structure — the exact ad set count, budget split, and audience for each method.
Drag 30 creatives at once — auto-named and pushed into the right ad sets.
Losers paused on threshold (CPA, ROAS, IPM) without you logging in.
Hit the promotion threshold and the creative duplicates into your scale campaign automatically.
Paused, promoted, burning budget — every test's state, without opening Ads Manager.
Connect Scalemate to your system and push every test result — winners, losers, hook rates — straight into your database or creative pipeline via API. Your generation tool spins up the next batch from what actually won, and the loop runs itself.
Creative testing — also called ad creative testing or creative ad testing — is the process of running multiple ad creatives against each other to find which ones drive the cheapest, highest-quality conversions before you put real budget behind them. On Meta and TikTok it means structured tests like 3-3-3, Hooks Tests, creative A/B testing, dynamic creative testing and multi-variant batteries, judged on CPA, ROAS, IPM or hook rate. This library collects 14 of those methods so you can match one to your budget and goal — and automate it without stitching together separate creative testing tools.
No. Every method is fully documented as a manual setup — run any of the 14 in Ads Manager by hand. The automation is just the optional shortcut: same method, launched from a template and watched for you.
There isn't one best Facebook creative testing framework — it depends on your goal, budget and platform. 3-3-3 is the common starting point for solo Meta buyers; post-Andromeda teams shipping 20+ creatives a week default to multi-variant battery testing on broad targeting. Use the filters above to match a method to your account.
For Meta ads creative testing in 2026, Andromeda changed the math. Under $5K/mo you can still run 5-10 across 2-3 ad sets; broad Advantage+ teams ship 10-30 per ad set, and above $50K/mo it's 50-100 a week as fatigue cycles compressed to 7-14 days.
Andromeda rewards creative volume and diversity over narrow audience targeting — broad ad sets carrying 10-30 distinct creatives now beat fragmented, hyper-targeted setups. Two practical shifts: test whole concepts, not 3 cuts of one ad (Andromeda's Entity ID dedup collapses near-identical variants into a single delivery slot), and ship fresh creatives weekly because fatigue cycles compressed to 7-14 days. The Multi-Variant Battery and Creative Refresh Cadence methods above are built for exactly this.
The frameworks carry over, but the signals don't. TikTok burns through creative faster, rewards native-feeling hooks in the first 1-2 seconds, and early on you lean on hook rate and IPM more than Meta-style CPA windows. Most methods in this library run on both platforms — each card tags whether it applies to Meta, TikTok, or both.
App-install UA judges creatives on IPM, CPI and downstream signals like d3 retention — not just front-end CPA — so cheap, high-volume testing matters even more than in e-commerce. That's why mobile UA teams run cost-optimized frameworks: Cheap Geo (validate in tier-3 / WW geos before spending tier-1 budget), Cheap Geo + AEO, and Mirror-BAU. All three are in the library above, filtered under Mobile UA.
3 ad sets × 3 creatives × 3 days — each ad set runs 3 distinct creatives for 72 hours, then you scale the winner and cut the rest. Popularized by Pilothouse; fast and cheap, but 3 days is tight signal on small budgets. Full breakdown in Method 01 above.
No. A/B Test compares two variants that are both running; Conversion Lift uses a true hold-out — some users see no ads — to measure real incrementality. Conversion Lift also isn't self-serve for most accounts; A/B Test is the one in your Experiments menu.
Depends on the method: 72h for 3-3-3, 5 days for the 3-2-2 sprint, 6-12 days for Bulk CBO, 7-14 days for Meta's A/B Test. The common mistake is running the method's cutoff at half the budget it needs — then your early signal is mostly noise.
Test more. Find winners faster. Learning compounds. Meta + TikTok — free tier, no credit card.