# A/B Testing Guide
A/B testing is the art and science of testing alternative versions of a design. Here we cover what it is, what you can test, and when you can test it.
A/B testing is about comparisons. Version A is compared with version B, in most cases, using a platform that simplifies the process of making your comparison.
# Traditional A/B Testing
Historically A/B testing was developed to test small changes to websites in a live production environment. Some users would be shown version A, and some users would be shown version B using some server tricks. If more users of version A convert than version B, then version A is assumed to be a better design than version B. It doesn't matter much why version A is better since it converted more people. Often when people talk about A/B testing, this narrow historical meaning of A/B testing is assumed.
# The A/B Testing Tool Landscape
Indeed if you wish to test the performance of alternative versions of designs in a live production environment, with real customers, you have many options. Companies like Optimizely, Hubspot, and Hotjar are just a few of the big players supporting this kind of A/B testing.
# Beyond Production Testing
However, with many of these tools, you're a little out of luck if you have things you want to compare, which you don't want to test in a live production environment. You're on a cross-disciplinary team doing a UX design process, and you've got multiple versions of something—in whatever form, prototype or otherwise—that you'd like to test. If that sounds like what you, then this guide is for you.
For this kind of A/B testing, you have fewer options. Tools like UserZoom exist, but have a hefty price tag and can be a challenge to configure.
# A/B Testing Characteristics
When you're setting up an A/B test, you're creating an experiment to determine which version is better. Most A/B tests:
- Have a relatively large sample size
- Test subjects (the users) are unaware of what's being tested
- The variations between versions can be small and quite subtle
It's easy to grasp how these properties of A/B tests work together. You're probably going to need a good-sized sample if what you're testing is subtle. As you learned in your introductory stats class, larger samples are generally better at accurately detecting subtle effects. And, since you're likely testing something subtle, it's not going to help much if your test subjects are aware of that slight variation. Because it's subtle (like a different navigation label or different design treatment), test subjects could have a hard time understanding what you're asking them to see.
To put it another way, designers obsess about details that ordinary mortals often miss. We know the subtle difference matters, though! That's what design is all about.
A/B testing can be the right tool to test multiple versions with subtle differences between the versions.
# When A/B Testing isn't The Right Tool
If those are the properties that most A/B tests have in common, it's worth thinking carefully about when an A/B test doesn't best serve your research question. In general, we suggest that A/B tests aren't what you want when:
- You don't have alternative versions.
- You do have versions, but the differences are quite large and noticeable.
- You're interested in things that aren't about "what version is best."
The first point may be obvious. If you don't have versions, you can't A/B test. Not every design research problem is about versions. To put it another way: don't just produce multiple versions because someone thinks A/B testing is the shiz. You've likely got other more significant problems to tackle first, things that you could handle through UX strategy techniques, for example.
# Multiple Versions Don't Always Require A/B Tests
Just because you might have multiple versions, it doesn't mean you have to do an A/B test. Suppose you have various versions, but the differences between them are significant and easy to see. In that case, you'll likely be better served by a balanced comparison test, allowing your testers to tell you directly which version they prefer and why, while reducing ordering bias (the balanced part of balanced comparison).
# Avoiding A/B Testing Mistakes
Finally, doing a simple user test is often your best choice for evaluating designs. User testing is an extremely powerful way to detect problems you're unaware of, which are often—if not always!—lurking in your designs. A/B tests assume you've conceptualized and built out the alternative versions. With A/B testing—whatever form it takes or tool you use—you're testing something that you've thought of, something you think you want to test! All of the folly of thinking you know something applies.
# A/B Testing Gaps
While traditional A/B testing solved a real problem that web developers face (being able to test alternative versions on live websites to see what wins), traditional testing leaves open several gaps.
- You can't test prototypes.
- You can't test apps.
- You have to use your actual customers.
- You've got to have real web traffic to generate your data.
- You've got to be comfortable using your customers as testers.
If you're an early-stage startup, you may not have many website visitors. If you're building an app and deploying it to the Apple App Store or Google Play Store, you're not likely to want to embed code in it to run an A/B test. It's hard enough building an app in the first place, and ideally, by the time it hits the store, you've sorted out all of those tricky design issues.
So traditional A/B testing doesn't have much to offer a UX design process since it usually happens only on websites and is performed after you've designed and built the alternative versions and deployed them on a live site.
# A/B Testing and UX Design
We prefer a definition of A/B testing that goes beyond testing websites in a production environment. Instead, we think it is useful to consider A/B testing as something you would do anytime you need to test multiple design versions.
When defined this way, A/B testing can have more to offer throughout the UX design process. Let's look at some of the kinds of questions you can answer when you broaden the scope beyond traditional A/B testing a bit.
# A/B Testing Questions
Here are just a few of the types of questions you can answer with our approach to A/B testing:
- Test the efficacy of alternative navigation designs in a web site prototype
- Determine which visual treatment people prefer
- Test different calls-to-action on an app prototype home screen
- Test different explainer videos on a home page
- Test alternative home page images and value propositions
- Test different home page layouts
The variations are many, and if you've got multiples think about A/B testing. Don't limit yourself to making comparisons in a live production environment. And don't limit yourself to websites. The big idea is that you're building prototypes. You should be able to A/B test them if you want to.
Remember, subtlety is key. Are the things we want to test subtle, and something people would have a hard time noticing? If so, then you've likely got a good candidate for an A/B test. If not, other methods, like user testing, might be your go-to. Remember, non-designers don't see the same things designers do, so try to use that as a general guide when thinking through whether or not to A/B test.
# Designing for A/B Tests
As we've already said, don't A/B test just because you can. Instead, think strategically. Is this problem that we're facing something that has alternative approaches? Is determining your design direction something the team can't agree on? Is this process critical now, or can we figure it out later with a more classic A/B testing approach on our deployed pages?
If you feel you should produce alternative versions, you can do so by staying within your prototyping tool like InVision or Figma. It's not time to get the developers involved. It's time to be lean and iterate.
# Getting Your Prototype Ready
As we say elsewhere, creating a prototype for collaboration and sharing among your team is different from creating one for testing with real people. Build out the flows so that people can interact with the prototype naturally. For A/B tests, have entirely separate flows or versions for A, B, and maybe even a C or D should you have that much variation. If screens are shared, just cut and paste from another flow, changing what needs to be changed.
Once you have your separate flows, create a different public share link for each flow. These share links will correspond to each version (A, B, etc.) that each separate group of people will respond to in your A/B test.
See our how-to guide to testing prototypes to create a share link for the big prototyping platforms.
# Settling Scores
Ever have disagreements about which design direction to take among the team? Of course you have! Having a team with strong opinions means you have a strong team. Reasonable people can agree to disagree (and run an A/B test to see which approach works best)!
# Measuring Preferences
Since traditional A/B testing is hidden from the user, you can't ask people to tell you about their experience. So you're a bit in the dark as to why one version performed better. But it doesn't have to be that way. When you run your A/B test on a platform like SoundingBox, you have the option of measuring how people feel immediately after interacting with one of your versions in the form of scale questions. Scale questions can give you measurements about, say, how the video they watched made them feel. With these measurements for each of your versions, you can quickly determine which version performed best.
Follow up your scale questions with an open-ended text response asking for a little more about why they answered the way they did. Open-ends will give you a richer understanding of the "why" behind their answers.
# Defining your A/B Test Scope
As with all research, it's essential to define your goal. Know what you're trying to learn, even if part of what you want to know is what you're missing. A/B tests are no different. Try to write down what you're hoping to learn and get some buy-in from the team. And of course, try to limit your scope to something meaningful. Don't try to learn everything all at once. Instead, break your research up into modules, the same way you would design other experiences.
# A/B Test Sample Size
When you're running traditional A/B tests in the context of high traffic production site getting large samples is easy. However, when you're running A/B tests on prototypes as part of a version comparison in your UX design process, getting a large sample is harder. Having a larger sample than a traditional user test is vital since we're trying to detect subtle differences and sometimes small effects. As a general rule, we recommend at least 30 participants per version. So a test with a version A and B would require a total of 60 people.
Suppose you come from a traditional user testing world. Testing 60 people may sound like a lot, since famously, in many usability tests, you can find 80% of usability problems in the first 5 participants. In usability tests, this is often true when you're interested in finding usability problems. But UX isn't all about usability. It's often about other things: emotions, for example, and preferences. Once we move into this territory—again where things get subtle where differences are harder to detect—a larger sample is necessary.
# A/B Testing Costs
A large sample means high costs, right? Maybe. But it doesn't have to. If you limit your scope to just one task and keep participant targeting to a minimum, you can get a larger sample without breaking the bank.