Docs

Reducing Bias With Balanced Comparison Tests

Balanced comparison tests are a great way to test multiple versions of something while reducing bias.

Let's say you have multiple versions of a prototype and you would like people to give you feedback on each version. But, to reduce bias, you want to alternate the order that you show people the versions. This is what is called a balanced comparison test, and it is something SoundingBox fully supports, even when you have three versions.

Why balanced comparison

Perhaps you've read about anchor bias. But common sense could tell you just as much: the first thing people experience can affect how they feel about subsequent things. A balanced comparison test alternates the stimulus order to reduce anchoring bias.

How balanced comparison works

One important note to start: balanced comparison doesn't rely on randomization. We're not randomizing the order, since randomization, especially in small studies, could easily result in an order that is not balanced. In other words, randomizing might still mean that one version gets shown more often or that the order isn’t swapped out evenly.

A balanced comparison of two prototypes.
Each prototype version has its own URL, and each of two groups of participants get the URLs in alternating order.

A balanced comparison test, like a prototype A/B test or a competitive test, leverages SoundingBox's ability to segment your participants into distinct groups. In a test that compares two versions of a prototype, you'll have two groups and two tasks. Group one will see the URL of the first prototype version (URL A) first. Group two will see the URL of the second prototype version first (URL B), helping to eliminate any ordering bias that could occur.

Things get more interesting when you want to do a balanced comparison of three prototype versions. Here you're going to need six groups and three tasks to handle all the possible combinations.

A balanced comparison of three prototypes.
Six groups are required to get all the possible combinations.

Notice how each group gets a different order of the three prototypes. Clearly a two prototype balanced comparison is simpler, but we've noticed that teams often produce three rather than just two alternatives, and now you have a way to test them.

Balanced comparison best practices

Remember that with a balanced comparison you're asking one person to look at two or three versions of something. As with all research, carefully consider what you want people to respond to. Asking yourself a few questions may help.

  • Is the prototype constructed in such a way as to make it easy for the participant to zero in on what you want them to look at in each task?
  • Have you provided clear instructions in the task text or via an instruction question type?
  • Have you taken steps to avoid participant fatigue by asking for too much in one test?

All of these best practices are somewhat related. Essentially be clear in your mind and in your instructions about what you're looking for. Adding an instruction question type before all your tasks to let people know that you're going to ask them to do, is a great idea.

Setting up your balanced comparison, step-by-step

It's easy to get started creating a balanced comparison. If you haven't created an account, you should do so now. You won't be charged until you're ready to launch your first study. You can set up your first test and preview the participant experience at no cost. There's no free trial to expire.

  1. Log in and click on create new study.
  2. Give your study a name and choose experience2 as your test type.
  3. Choose your devices and people taking care to choose at least five people for each of your groups. So a two prototype balanced comparison should have at least 10 participants. A three prototype balanced comparison needs at least 30.
  4. Set up your screening questions to determine who can take your test.
  5. Next, the study step will pop up a dialog asking you to provide your group names. If you have two prototypes, you'll have two groups. If you have three, you'll have six. Unlike other tests that rely on groups, your group names can be something as generic as "Group 1", "Group 2" or "Sequence 1", "Sequence 2", and so on.
  6. Choose a template or start adding your tasks and questions one by one. Click on any task that you have added. You'll notice that the groups you defined appear in your task. This is where you put your prototype URLs, alternating the order based on the diagram above. You may want to keep a spreadsheet of prototype URLs and their respective groups to keep things straight—especially for a three prototype test. You'll need two tasks for a two prototype test and three for a three prototype test.
  7. The final step is to set your quotas for any screening questions you have defined. We recommend for balanced comparisons that you choose yes here. You want the same mix of people to be in each of your groups to stay close to your goal of reducing sources of bias.
  8. Initialize your study and click on Get Study URL to try it out as a participant. If it looks good, you're all set to launch.

Analyzing your balanced comparison results

As with all experience2 tests, your results will be divided into groups. With balanced comparison tests, your groups correspond to the order in which participants see the prototype versions. This ordering is important because your goal was to determine which prototype version people preferred, so looking at your data on the group-level is key.

Our dashboard makes it easy to toggle back and forth between group and study-level data by providing a toggle switch. It merges and splits your groups in one click.

Toggling between merged and split groups in the dashboard.
Merging your groups combines all your group data into one study-level group. Whereas splitting it partitions your data into each group—something you'll need to do in order to determine which version people liked most.

If you're following our recommendation and asking a single select question to capture what version participants preferred, exercise some care in looking at the group preference data. Remember each group saw the prototypes in a different order. If someone in group A preferred version one, it's not the same as someone preferring version one in group B. Make sure that you're looking at the summary of this question on the group level, rather than the study level. By focusing on the group, you'll know what version one or two means since the group determines the sequence.

You've used your groups to puzzle out which version people preferred. Make a note of it, and move on to exploring the replays and any open-ended responses you collected for clues about why people made their choice. You may find it easier to work with your study data in its merged state at this point since splitting it into your groups has mostly served its purpose.

Read more about our analysis dashboard.

How balanced comparison compares to other test types

Balanced comparison can help you choose between alternative designs. But should it always be your go-to when you have multiple versions of something that you want to test?

It could be. But consider whether the difference you are trying to detect is something you can assume that testers, ordinary people, not designers, can distinguish. Remember: a key concept with balanced comparison is that test participants notice and tell you which version they prefer and hopefully articulate why they prefer it. But what if the difference is something as subtle as an alternative layout or different color scheme? As designers we know such things matter, and sometimes influence unconscious behavior—but that doesn't mean people will notice it and be able to talk about it.

As in all things, common sense can be your guide. Ask yourself: is this something too subtle for a balanced comparison? If that's the case, you may well be better served by our prototype A/B test type. Prototype A/B tests are designed just for this: to detect the effect of subtle differences in design.