What is A/B testing?
A/B testing is splitting website visitors over different variations of the same page(s) and analyzing the impact on your KPIs and user behavior.
For instance, for my home page, I could test the header copy (see image below). 50% of the website visitors will still see the original version, but the other 50% will see the variation with a different header text. Next, in the data, I can see which header resulted in the highest number of conversions.
To set up an A/B test, you need a testing tool, knowledge of data and statistics, and preferably coding knowledge.
Benefits of A/B testing
Data shows that only 25% of changes we make to our websites and digital products positively impact the user experience and website goals. This means 75% makes no difference or even hurts your goals. Therefore, A/B testing is crucial.
It is essential to remember that A/B testing is not a goal. It is a means to an end, and when done correctly, your business will experience more alignment with its users and growth.
Know what works
With A/B testing, you know what works and what does not work. Instead of implementing all changes, you only implement winning changes, resulting in customer satisfaction, more revenue and business growth.
Learn about your users
A/B testing is also a form of research, as in the academic world. You learn about your customers’ needs, motivations, and behaviors. This allows you to understand them better and optimize your business.
A/B testing helps avoid the risks of making changes that negatively impact user experience. It is also more cost-efficient, as it helps identify the most effective changes, saving time and resources.
Other forms of testing
Besides A/B tests there are several other experiments you can run to validate your ideas and hypotheses.
A/B/n testing is an A/B test with more variations. For an A/B/n test, all the same principles apply as an A/B test. The advantage is that you can test more variations at once. The downside is that you need a lot more traffic and adjust statistics.
Multivariate testing (MVT)
User testing is useful when you have insufficient traffic for A/B testing or to validate a big idea before having development work on it for many hours. Great user tests are 5-second testing, preference testing, and (prototype) usability testing.
Split URL / Redirect test
In this case, you send 50% of users to a different page. It can be useful when you have a lot of changes in your variation, when testing new features, or new landing pages. I am not a big fan of these tests as they often cause (data) problems.
Smoke / Fake Door test
In a smoke (or fake door) test, you show the user a product, service, or feature on your website that does not exist yet. It helps rapidly validate an idea before putting many coding hours into it. However, you do trick your visitors, so don’t do this test too often.
When you have a long list of test ideas, you need to prioritize them. There are several frameworks for this.
PIE: Potential, Importance, and Ease. The potential is how much improvement you expect the change will make. The importance is how much traffic will see this change and how valuable the traffic is. Finally, ease is the ease of implementing and setting up this test. Give them a score from 1 to 10. The average is the PIE score.
ICE: Impact, Confidence, and Ease. Impact is what will be the expected impact if this change works. Confidence is how confident you are that this will work. And ease is how easy is it to implement the change? Like with PIE, give it a score from one to ten and take the average.
PXL: Created by Peep Laja. This framework is based on facts. For instance, is the change noticeable within five seconds? Is the change above the fold? Did this idea come from user testing or data analysis?
The best thing about these frameworks is their simplicity, but they have their downsides. PIE and ICE are fully subjective, and barely based on facts. However, the most important downside of these frameworks is the huge lack of evidence.
For instance, in the PXL model, ideas related to issues found in qualitative feedback get a higher prioritization score. However, this might not lead to better experiments. Perhaps in your situation, ideas related to qualitative feedback have a low win rate. Still, you consistently give these ideas a higher prioritization score, majorly declining your experimentation win rate!
Therefore, I created an evidence-based prioritization framework, partly automated, with a feedback loop. Check the video to get step-by-step instructions on how to set it up.
A/B testing tools
There are many testing tools on the market. When working on experiments, you will likely spend a lot of time with these tools. Therefore, make sure you choose the right one.
Most popular tools are:
- Convert: A low-cost – high-quality testing tool. With great customer support.
- VWO: VWO is about in the middle when it comes to its pricing. Besides setting up A/B tests, this tool also offers heatmaps, recordings, surveys, and documentation.
- Optimizely: The enterprise tool out there. The price is by far the highest, but it comes with many features for advanced testing.
Some other popular tools out there are ABTasty, Kameleoon, and Omniconvert. But there are many more.
Thanks to the growing popularity of my CRO courses and CRO Tips newsletter, I’ve secured exclusive deals with several great tool vendors.
Setting up an A/B test
In every client-side testing tool, there are two ways to make changes in the variation of your tests.
The first method is to make changes in the visual editor. This is pretty easy and straightforward. However, the testing tools will write their own code, and in general, this is not the best code. For very easy changes, editing through the visual editor might work, but for larger tests with bigger changes, it will likely not work on some browsers and screen sizes, therefore making your test less trustworthy and giving your visitors a bad experience.
You can learn to write code for A/B testing quit quickly with my Coding for A/B testing course on Udemy. You can find it with a big discount on my course page.
Analysis and statistics in A/B testing
Before creating your A/B test, you must know if you have sufficient traffic and conversions, and how long the test has to run. You can use this calculator by Speero.
For the calculator, you need to understand two terms:
- Confidence level: The probability that the measured outcome is true for the entire population.
- Power: The likelihood of a significance test detecting an effect when there is one. Common practice is to use a power of 80%.
Aim for a Minimal Detectable Effect (MDE) below 10% (preferably 5% or lower).
Minimal Detectable Effect
The MDE is the minimum difference that can be reliably detected in your test results. With an MDE of 5%, you can find a statistically valid uplift of 5% or higher in your experiment. When the uplift is below 5%, there is a bigger change for a type I error (false positive).
The lower your MDE, the higher the impact you can make with your A/B tests. Because a low MDE means you can detect smaller significant changes. What is a good MDE?
- < 5%: Perfect for finding small significant uplifts in your conversion rates.
- 5 – 10%: Good for finding bigger uplifts. You need to make big changes in your variation preferably above the fold.
- >10%: Not ideal. Make huge changes and make sure you have solid evidence for these changes to work from your research.
Sample Ratio Mismatch
There will always be a small difference due to chance. But when the difference gets (statistically) too big, there’s a problem. This is called a Sample Ratio Mismatch (SRM).
An SRM check calculates if the difference between the distribution of your visitors gets too big. When it does, your A/B tests and your data could be flawed.
Solving an SRM error can be quite tricky. First, re-run the experiment. If the problem persists, get a technical web analyst and developer involved.
Before you analyze your test you need to decide: Frequentist or Bayesian.
- Frequentist: Looks at statistical significance.
- Bayesian: The probability that B is better than A.
As probability is much easier to understand, I prefer the Bayesian method. With the probability, you can make a risk assessment. At what probability will you implement the variant? I generally use >80%.
After making this decision, place your data in a calculator, and find out if your experiment is a winner or not.
Binomial & non-binomial
A non-binomial KPI can be 0 to indefinite. Order value, page views, and sessions are examples of this.
Different formulas are used to calculate significant differences.
There are two ways to calculate the significance of non-binomial KPIs:
- Get the whole dataset (not averages or totals). This can be the revenue of each order or the page views for each visitor. And use a calculator like Blast.
- Make it a binomial KPI. For instance, set the KPI to visitors with at least 3 pageviews or visitors with an order value above €100.
False positive / negative
With every experiment, you will have the data measured in the experiment and what will happen in reality after implementing the change.
With a false positive, you have a winning experiment, but it does not result in a conversion increase in reality.
With a false negative, you do not have a winning experiment, but it would lead to an uplift in reality.
If you have sufficient data, you will decrease the chance of running into false positives and negatives. Therefore, you must do your pre-test calculations.
Testing on a low traffic website
You still can and should test your ideas even if you do not have enough visitors and conversions for statistically valid A/B testing. There are two things you can do with a low-traffic website.
1. Set the KPI to a mico-conversion. For example, if you have an A/B test running on your product detail page, you can track the number of cart visits instead of the number of transactions. As you have many more website visitors in your cart compared to website visitors on your thank you page, you need fewer visitors to make your A/B test statistically valid.
Setting the KPI to two steps down the funnel is even better. If a visitor takes two steps towards the final conversion, it does show a stronger intention to buy eventually.
2. Apply user testing. You can implement the changes and interview users, have a survey on your website, or conduct a five-second or preference test. You could also conduct usability tests to validate your ideas. First, conduct a usability test on the current website. Next, implement the changes, perhaps in a test environment, and run another usability test. Now, analyze the differences between the two tests.
I love to use the tool called Lyssna for my user tests. You can get an extended free trial using the link on CRO tools page.
Draw learnings from A/B tests
After every A/B test, answer the following questions:
- What do the results tell us about the initial hypothesis?
- If the test wasn’t a winner, was the hypothesis off, or did the execution fall short?
- What insights do we get when combining these results with our existing knowledge from previous tests and research? Is it consistent with previous findings? Did you learn something new?
- What might these insights suggest about our customers’ needs, motivations, and behavior?
- What new experiments can we design based on these learnings?
To ensure these questions are top of mind when analyzing A/B tests, I include them in reporting templates like Airtable (see image).
Client-side & Server-side testing
- Easy to install
- Advanced experimentation is very difficult or impossible (i.e., changing the sequence of steps in the checkout, product features, and algorithms)
- There is a chance of the flicker effect (flash of original content)
- Can be challenging for websites built on a single-page application or with many dynamic elements
- Cookies used by client-side testing tools become less useful due to privacy regulations and deletion
- It could impact the performance of your website due to adding an extra snippet
These downsides, of course, never outweigh all the advantages you get from A/B testing. Client-side A/B testing is still growing rapidly in many markets. As it is easy to install and setup tests, it is perfect for running many great A/B tests.
With server-side experimentation, the changes are loaded on the server before being sent to your visitor’s browser. It is gaining popularity and is mainly used by more mature organizations.
- Every experiment is possible
- It uses server-side first-party cookies, thus less limiting privacy regulations
- No flickering effect
- The preferred method for developers
- Winning variations can be implemented immediately as the code is already on the server
- It is a lot harder to set up experiments as, you have to use the website’s coding language. This means development teams have to set up all experiments, and you need sufficient capacity for that.
- Using existing solutions can become quite costly. When building your own server-side testing tool, you need developers to maintain and update it.
Server-side testing requires a high CRO maturity within the organization to succeed. Most companies are not there yet, but server-side experimentation is the future.
However, it will still take many years before it becomes the standard. Until then, client-side experimentation is becoming more and more standard practice for every company around the globe.
A/B test reports
Creating a nice-looking report from your test results is helpful for two reasons. The first is for documentation purposes, and the second is to update the organization and make them enthusiastic for experimentation.
If you make a report from your A/B test, it is important to keep it simple. Never copy and paste raw data from Google Analytics into your report. Instead, make your reports nice and easy on the eye.
A report should consist of:
- Reason for test and test hypothesis
- Setup of the test (duration, segmentation and KPI’s)
- Results for each main KPI
- Business case
- (Optional: segmentation and other useful insights)
- Learnings & recommendations
- Your name
When you update your colleagues be smart and use your experimentation mindset. Some colleagues might want to see every A/B test you did. Higher management might want a monthly or quarterly update, and yet others want to be updated during a lunch and learn session, for instance. And for some, apply gamification, like a which test won competition. See what works for who in your organization to get as many colleagues on board as possible.