Blog

05 grudnia 2018 Krzysztof Kaiser

A few lessons from running A/B tests in travel-tech company

A few lessons from running A/B tests in travel-tech company

This is the first post in a series about our User Experience Team. In the next ones we’ll discuss our design process, usability testing workflow and ongoing work on our design system.

Two years ago we’ve decided to incorporate A/B testing into our design process. We were excited to see how even the slightest changes to design would affect our business. Also, we feel like we shouldn’t pass up on the opportunity to create new, shared language between us – designers – and stakeholders. The language that would revolve around concepts like “metrics”, “hypothesis” and “data”.  By adopting that language we would change company culture to more data-informed and research-based. Or so we hoped.

Below you’ll find a few observations or maybe even lessons from two years of running A/B tests. Some of these remarks relate to culture changes, others to testing methodology, while others are specific to the type of services we provide.

  • Have developers on your side (or learn JavaScript)

When we started testing we were probably too optimistic about our abilities and power of WYSIWYG editor in Google Optimize (our tool of choice). We’ve found out very quickly that hypotheses we’re trying to verify call for more advanced approach. Visual editor comes handy when you need to change website copy or color of call to action button, but we wanted to test more sophisticated scenarios with presentation logic behind it and that type of experiments required knowledge of JS.

And yet, for the reasons to be discussed in another article, we still wanted to make testing designer’s responsibility. We needed to change our approach.

We came up with the idea of workshops – ran by developers (shout-out to Alek), but targeted to designers. Aside from learning the basics of JavaScript, we came up with toolkit of scripts for typical tests (e.g. change in dynamically loaded element).

For more complicated experiments we’re still working in pair with developer. It really speeds up setting up the experiments.

In our case the reinforcements have come organically – some of the devs were already interested in our tests and they offer to help – but in retrospect we should have predicted this situation and be better prepared for it.

  1.  Don’t come up with an idea. Have hypothesis.

Coding skills aside, creating A/B test is relatively easy. And that’s probably a trap we fell into when we were running a shitload of experiments just to see what happens.

Nowadays, we’re trying to make sure that every idea for test – no matter who come up with it – should take the form of hypothesis.

Source: Optimizely

Hypothesis is our assumption about results of interface change. It should clearly state the place that’s changed (for example: login section in transaction form page) and predicted outcome formulated in a way that make it possible to check if the goal was unequivocally reached.

It’s important that prediction is based on research, analysis or observation, which it’s just another way to say it should have a rationale.

Usually our hypotheses are based on the observations from user session replays (we’re using Hotjar for that) or traffic behaviour from Google Analytics. Some of them are results of shared business knowledge of certain market, while others are results of quantitative research (usability testing and in-depth interviews)

Hypothesis example:

If we turn off the cheapest flights table on search results on spanish market, RPM will increase 10%. Prediction is based upon analysis of similar market and conversion funnel analysis in Google Analytics.

  1. “The goal is conversion”. Really?

Before running tests you should develop metrics which will be used to measure the effectiveness of test variants. There are variety of metrics and choosing one over another depends on functionality you’re testing, type of product or specificity of industry.

Examples:

  • Newsletter signup or price alert sign-up. In this case the goal is to create as large database of e-mails as possible. So the right metric will be the number of signups.
  • Contact form. The number of sent messages still can be used as metric, but only if contact form is one of the sales channels. However, if form is used only as a last resort for clients that have issues with the product, the key metric will be different. In that case frequency of use is rather a symptom of problems than a measure of success.

In travel tech, the volume of sales is very high, services are often more complicated than they appear and business model is based mostly on margin. Under these conditions the main business goal is not a conversion, but rather obtaining the high revenue per mille (RPM) of user sessions or page views. Dialing up the sales instead of concentrating on revenue comes with risks: increase in claims costs and customer service before and post sale.  So, the goals and the metrics you’ll choose are deeply depended on context and connected to your organization business goals.

  1. The end is just the beginning.

The end of experiments is just the beginning of in depth data analysis.

In our industry the deciding factors that affect motivation to purchase (and hence the conversion rate) are:

  • time to flight
  • the length of journey
  • type of flight (domestic, international, intercontinental)
  • type of airline (low cost or regular)

Someone who search for a flight 3 days in advance takes into account the higher cost of flight in comparison to a flight that will take place in 3 months. Perhaps he is motivated by the urgency to be in certain place at a certain time (for example: he’s a speaker at the conference). Totally different behavior displays someone who’s seeking out flight deals for a cheap city break. That person is rather in discovery mode than in need of purchase.  

In other words, conversion rate for a person searching for flight in short time span is way better than one’s who’s looking for flights for next year’s holiday season. Observing behaviors in relation to after mentioned factors is one of the basis for a personas creation.

To acquire meaningful results we need to make sure that in tested groups distribution of users with similar behaviours were more or less equal. That often requires more in depth analysis and segmentation of traffic after the test.

Another thing is that that every sample of data is in someway contaminated. For example, by untypical traffic generated by users that display strange behaviors or just behaviors that are not connected to our personas. That kind of traffic should be removed from the analysis. Otherwise the variants of experiment would become impossible to compare in methodologically sound way.

  1. „Do we even test?” Share your results.

We’re trying to build a culture in which commitment to a product development is not displayed by managers, but also by developers and designers. It’s a cliche, but commitment for work rises when you can see the results of your labour. On other side, for a not technical employees A/B testing may seem like something difficult and „techy”. That’s why we’re trying to communicate test results across the whole company in simple (but not oversimplified) way.

How you will share the results often depends on the type of company you work. But no matter if it’s a message on slack channel, presentation displayed on TVs in the office or weekly newsletter – do it.

Authors: Krzysztof Kaiser, Mariusz Janosz

 

Zobacz na blogu

09.09.2022
Marcin Jahn
It’s Not Just HTTP It’s Not Just HTTP

In today’s world of cloud-based solutions, distributed systems, and microservices-based architectures, network communication is a...

23.08.2022
Adam Mrowiec
Konferencja IPC 2022 Berlin Konferencja IPC 2022 Berlin

Pandemia wreszcie się kończy, dlatego w tym roku postanowiliśmy wrócić do naszych wyjazdów na konferencje....