A/B testing checklist: before, during, and after your experiment
This checklist covers the full lifecycle of an A/B test — from identifying what to test through analyzing results and documenting learnings. Use it to ensure you do not skip critical steps that affect the reliability of your results.
Before the test
- Identify the page or feature to test, based on analytics data showing a problem (high bounce rate, low conversion, funnel drop-off)
- Review qualitative research (usability tests, surveys, support tickets) for evidence of why the problem exists
- Write a hypothesis: “Changing [element] to [new version] will [change] [metric] because [reason]”
- Choose the primary metric that determines the winner
- Choose 1-2 guardrail metrics to detect unintended consequences
- Calculate the required sample size using a sample size calculator (inputs: baseline metric value, minimum detectable effect, 95% significance)
- Estimate the test duration based on daily traffic and required sample size (minimum 2 weeks, ideally 4)
- Design and build the variant, changing only one element
- QA the variant across Chrome, Safari, Firefox, Edge on desktop, tablet, and mobile
- Verify that tracking fires correctly for both control and variant (check analytics events)
- Run a brief internal pilot (a few hours) to confirm data collection works
During the test
- Launch the test with the planned traffic split (typically 50/50)
- Resist checking results daily — set a review date at the planned test end
- Monitor for technical issues only (broken rendering, tracking failures) during the test period
- Note any external factors that occur during the test (promotions, holidays, press coverage, outages)
- If a critical bug is found in the variant, pause the test, fix, reset data, and restart
After the test
- Verify the test reached both the required sample size and minimum duration
- Check the primary metric: is the result statistically significant (p < 0.05 or >95% Bayesian probability)?
- Assess practical significance: is the effect size large enough to matter for the business?
- Segment results by device, traffic source, new vs. returning users
- Check guardrail metrics for unintended negative effects
- Document the result: hypothesis, what changed, sample sizes, conversion rates, confidence interval, segment findings
- Make a decision: ship the variant, keep the control, or iterate with a follow-up test
- Share the documented result with the team (even if the test lost — losing tests contain learning)
- Plan the next test based on what was learned