Docs

Decide Between Alternatives With Prototype A/B Tests

You've got two or more versions of a prototype, and you need to test how people react to each.

Producing multiple versions of a prototype is a cornerstone of any design process. Your job is to generate ideas and see which one sticks. But getting to the final version can be hard when there is disagreement among the team. Maybe there's not a clear way to tell the difference between the versions yourself. Perhaps the client has asked for something more empirical. A lot is riding on your decision. More inputs shouldn't be a blocker.

Traditionally testing multiple prototype versions has been handled in two somewhat less-than-optimal ways.

  1. Build out the alternatives and do a formal A/B test using any number of A/B testing platforms.
  2. Do a basic usability test where you ask the user to compare the alternatives and pick one that they like.

The first approach is fine if you have developers to build out your versions and don't mind using real customers as test subjects.

The second approach, usability tests, is good if you can rely on your testers to see the differences in the alternatives. Seems easy, but often is not. The things you care about, the things that are different, and which you know as a designer matter, are likely to go unnoticed, consciously at least, by test participants. No matter how careful you are in presenting the alternatives, your results can end up a muddled mess.

There is an easier way

A SoundingBox experience2 prototype A/B test solves these problems by creating a workflow for measuring and comparing the differences between prototype versions. It uses some of the principles from both usability testing and A/B testing. User test participants provide the feedback on your prototype versions, so you're not having to build things out and test with actual customers on a live site. Group A interacts with version A, and group B interacts with version B. Each participant group can be comprised of the same mix of people (the same demographic mix), and each group is asked the same questions about their preferences after interacting with the prototype.

At the end of the day your prototype A/B test lets you say people preferred version A over B n% of the time. Of course, you can also determine the extent to which people felt successful and were successful using each version—all without leaving the comfort, and low cost, of your prototyping tool. Participants aren't asked to detect differences between the designs on their own. Instead, the numbers tell the story. And qualitative data help flesh it out. You can go back to clients and team members and present empirical results, arguing persuasively for one version over the other.

Creating your first prototype A/B test

Create an account if you haven't already. Create a new study and select experience2 as your test type. Experience2 tests are like other SoundingBox studies in that they are comprised of tasks (things we ask people to do) and questions (things we ask people about how they felt doing their activity).

Setting up your groups

Partitioning your responses into groups happens when you set up your tasks. Tasks in an experience2 test have additional properties which allow for you enter multiple group URLs per task. On the backend, we take care of the rest. If you're asking screening questions, and choose to get an even mix of participants, the same combination of participants will interact with each prototype URL, giving you that apples-to-apples comparison that you need.

Read more about designing your study.

A note on sample size

Since experience2 tests involve asking people about their opinion about an experience (how they felt), having an adequate sample size can be vital to making claims about your data. That said, you can remain agile and not break the bank with around 30 participants per group. If you want greater certainty about your results, you're welcome to go higher, and many customers do.

Read more about screening participants.

Analyzing your results

Once your results are in, click on Analyze and load up your study by clicking on the study tile. You'll notice right off that we load your prototype versions to the right of your study, and clicking on the prototype tile loads more tiles, each of which summarizes the questions you've asked. At a glance you can see which prototype "won" by looking at the Overall tile and toggling between your versions.

Next try clicking on the Comparison tab. Here you'll see a chart showing the same summary data for each prototype version, with one dot for each version. If you've iterated and have prior A/B tests you'd like to compare, load them and you'll see them in this view alongside your current study, letting you see how much things have changed between iterations.

Read more about our analysis dashboard.

Getting to the "why"

All of this is just a starting point. You can see which version won the face-off, but you still need to come up with some reasons why it won so you can tell the story to your colleagues or your client. That's where the open-ended responses and the replays come in. Often participants will give you clues about their feelings by telling you about them in the open-ended (free text) responses you've asked them to provide. You can find other clues by replaying their interactions. Did they encounter usability problems, or react adversely in verbal comments as they interacted?

You'll find replays in the Replay tab, and you'll find open-ended text responses in the Grid view. Remember clicking on any tile or data point in the dashboard will sort replays by that measure, making it easy to prioritize which responses to watch first.