# Comparing Versions

Producing multiple versions of a design is something design teams routinely do. Yet too often we neglect to test how people respond to multiple versions systematically. We aim to change that.

SoundingBox has two ways to make comparing different versions of your design easy: balanced comparisons and prototype A/B tests. Each has a slightly different approach to helping you answer your research question.


Balanced comparisons and prototype A/B tests are both types of the split test test architecture. We provide templates for each to help you get started.

# Balanced Comparison Tests

Balanced comparison tests are a great way to test multiple versions of something while reducing bias.

Let's say you have multiple versions of a prototype and you would like people to give you feedback on each version. But, to reduce bias, you want to alternate the order that you show people the versions. This is what is called a balanced comparison test, and it is something SoundingBox fully supports, even when you have three versions.

# Why Balanced Comparison

Perhaps you've read about anchor bias. But common sense could tell you just as much: the first thing people experience can affect how they feel about subsequent things. A balanced comparison test alternates the stimulus order to reduce anchoring bias.

# How Balanced Comparison Works

One important note to start: balanced comparison doesn't rely on randomization. We're not randomizing the order, since randomization, especially in small studies, could easily result in an order that is not balanced. In other words, randomizing might still mean that one version gets shown more often or that the order isn’t swapped out evenly.

A Balanced Comparison of Two Prototypes
Each prototype version has its own URL, and each of two groups of participants get the URLs in alternating order.

A balanced comparison test, like a prototype A/B test or a competitive test, leverages SoundingBox's ability to segment your participants into distinct groups. In a test that compares two versions of a prototype, you'll have two groups and two tasks. Group one will see the URL of the first prototype version (URL A) first. Group two will see the URL of the second prototype version first (URL B), helping to eliminate any ordering bias that could occur.

Things get more interesting when you want to do a balanced comparison of three prototype versions. Here you're going to need six groups and three tasks to handle all the possible combinations.

A Balanced Comparison of Three prototypes
Six groups are required to get all the possible combinations.

Notice how each group gets a different order of the three prototypes. Clearly a two prototype balanced comparison is simpler, but we've noticed that teams often produce three rather than just two alternatives, and now you have a way to test them.

# Balanced Comparison Best Practices

Remember that with a balanced comparison you're asking one person to look at two or three versions of something. As with all research, carefully consider what you want people to respond to. Asking yourself a few questions may help.

  • Is the prototype constructed in such a way as to make it easy for the participant to zero in on what you want them to look at in each task?
  • Have you provided clear instructions in the task text or via an instruction question type?
  • Have you taken steps to avoid participant fatigue by asking for too much in one test?

All of these best practices are somewhat related. Essentially be clear in your mind and in your instructions about what you're looking for. Adding an instruction question type before all your tasks to let people know that you're going to ask them to do, is a great idea.

# Setting up a Balanced Comparison

It's easy to get started creating a balanced comparison with our balanced comparison templates. If you haven't created an account, you should do so now. You won't be charged until you're ready to launch your first study. You can set up your first test and preview the participant experience at no cost. There's no free trial to expire.

  1. Log in and click on create new study.
  2. Choose either the two or three version balanced comparison template.
  3. Add your version URLs in the dialog that pops up. If you don't have versions yet, just enter any URL. You can edit them later.
  4. Customize the template to your needs.
  5. Set up your screening questions to determine who can take your test.
  6. The final step is to set your quotas. We recommend for balanced comparisons that you choose yes to get the same mix of participants in each group. You want the same mix of people to be in each of your groups to stay close to your goal of reducing sources of bias. Also enter how many participants you'd like to complete the study.
  7. Publish your study and click on Preview to try it out as a participant. If it looks good, you're all set to launch.


A balanced comparison asks the test participant to compare versions and tell you what they preferred. Adding a question with only one answer choice (single select) tells you which version was preferred in one data point. Both balanced comparison templates have a question like this.

# Analyzing Results

As with all split tests, your results will be divided into groups. With balanced comparison tests, your groups correspond to the order in which participants see the prototype versions. This ordering is important because your goal was to determine which prototype version people preferred, so looking at your data on the group-level is key.

Our dashboard makes it easy to toggle back and forth between group and study-level data by providing a toggle switch. It merges and splits your groups in one click.

If you're following our recommendation and asking a single select question to capture what version participants preferred, exercise some care in looking at the group preference data. Remember each group saw the prototypes in a different order. If someone in group A preferred version one, it's not the same as someone preferring version one in group B. Make sure that you're looking at the summary of this question on the group level, rather than the study level. By focusing on the group, you'll know what version one or two means since the group determines the sequence.

You've used your groups to puzzle out which version people preferred. Make a note of it, and move on to exploring the replays and any open-ended responses you collected for clues about why people made their choice. You may find it easier to work with your study data in its merged state at this point since splitting it into your groups has mostly served its purpose.

Read more about our analysis dashboard.

# When to Use Balanced Comparison

Balanced comparison can help you choose between alternative designs. But should it always be your go-to when you have multiple versions of something that you want to test?

It could be. But consider whether the difference you are trying to detect is something you can assume that testers, ordinary people, not designers, can distinguish. Remember: a key concept with balanced comparison is that test participants notice and tell you which version they prefer and hopefully articulate why they prefer it. But what if the difference is something as subtle as an alternative layout or different color scheme? As designers we know such things matter, and sometimes influence unconscious behavior—but that doesn't mean people will notice it and be able to talk about it.

Let common sense be your guide. Ask yourself: is this something too subtle for a balanced comparison? If that's the case, you may well be better served by a prototype A/B test type. Prototype A/B tests are designed just for this: to detect the effect of subtle differences in design.

# Prototype A/B Tests

You've got two or more versions of a prototype, and you need to test how people react to subtle differences in each.

Producing multiple versions of a prototype is a cornerstone of any design process. Your job is to generate ideas and see which one sticks. But getting to the final version can be hard when there is disagreement among the team. Perhaps the client has asked for something more empirical. A lot is riding on your decision. More inputs shouldn't be a blocker.

Testing multiple prototype versions can be handled through a balanced comparison test. This can be a great option if the things you're comparing are easy for people to detect and consider. But what if the differences are subtle? And what if you're looking to get empirical data about what people prefer?

Asking testers to see the differences in design alternatives can be tricky. The things you care about, the things that are different, the things which you know as a designer matter, can go unnoticed by test participants.


When you've got version differences that will be easy for people to see and discuss, consider balanced comparison. When you've got subtle differences that will be hard to notice or are highly preference-oriented, prototype A/B may be a better choice.

A SoundingBox prototype A/B test solves these problems by creating a workflow for measuring and comparing the differences between prototype versions. It uses some of the principles from both usability testing and A/B testing. User test participants provide the feedback on your prototype versions, so you're not having to build things out and test with actual customers on a live site. Group A interacts with version A. Group B interacts with version B. Each participant group can be composed of the same mix of people (the same demographic mix), and each group is asked the same questions about their preferences after interacting with the prototype.

At the end of the day your prototype A/B test lets you say people preferred version A over B N% of the time. Of course, you can also determine the extent to which people felt successful and were successful using each version—all without leaving the comfort, and low cost, of your favorite prototyping tool. Participants aren't asked to detect differences between the designs on their own. Instead, the numbers tell the story. And qualitative data help flesh it out. You can go back to clients and team members and present empirical results, arguing persuasively for one version over the other.

# Creating an A/B Test

Creating an A/B test is made easy with our A/B testing template.

  1. Create an account if you haven't already.
  2. Choose the A/B test template.
  3. Enter a group name for each prototype that you have in the dialog, such as "Version A" and "Version B". This will make it easy for you to identify your prototype versions later.
  4. Customize the template to your needs. If you have additional tasks you would like to include, add those here. Each task that you add has a field for its own alternative prototype versions.
  5. Set up your screening questions to determine who can take your test.
  6. The final step is to set your quotas for any screening questions you have defined. We recommend for prototype A/B tests that you choose yes here. You want the same mix of people to be in each of your groups to reduce sources of bias.
  7. Publish your study and click on Preview to try it out as a participant. If it looks good, you're all set to launch.

Read more about study design.

# A Note on Sample Size

Since prototype A/B tests usually involve asking people about their opinion about an experience, having an adequate sample size can be vital to making claims about your data. That said, you can remain agile and not break the bank with around 30 participants per group. If you want greater certainty about your results, you're welcome to go higher, and many customers do.

# Analyzing Results

Once your results are in, click on Analyze and load up your study by clicking on the study tile. You'll notice right off that we load your prototype versions to the right of your study. Clicking on the prototype tile loads more tiles, each of which summarizes the questions you've asked. At a glance, you can see which prototype "won" by looking at the Overall tile and toggling between your versions.

Next try clicking on the Comparison tab. Here you'll see a chart showing the same summary data for each prototype version, with one dot for each version. If you've iterated and have prior A/B tests you'd like to compare, load them. You'll see them in this view alongside your current study, letting you see how much things have changed between iterations.

Read more about our analysis dashboard.

# Getting to the "Why"

All of this is just a starting point. You can see which version won the competition, but you still need to come up with some reasons why it won so you can tell the story to your colleagues or your client. That's where the open-ended responses and the replays come in. Often participants will give you clues about their feelings by telling you about them in the open-ended (free text) responses you've asked them to provide. You can find other clues by replaying their interactions. Did they encounter usability problems, or react adversely in verbal comments as they interacted?

You'll find replays in the Replay tab, and you'll find open-ended text responses in the Grid view. Remember clicking on any tile or data point in the dashboard will sort replays by that measure, making it easy to prioritize which responses to watch first.