Growth Hacking and the Bandit Problem

“Growth Hacking and the Bandit Problem” is a recent talk by yours truly introducinga recent talk by the awesome Noel introducing multi armed bandits as a superior way of A/B testing. In case you missed it we decided to write it up as a blog post.

Our Hero

We begin by introducing our main character and hero of the story, the growth hacker. He is driven by one thing and one thing alone: pushing growth ever upwards and ever rightwards. Swoon.

To do this he follows 3 simple steps: Build, Measure, and Learn, a process handed down by the prophet of growth, Eric Ries. These 3 steps give our hero what he needs: a structured process to drive growth. Starting at the top, he gets something built, let’s say a new sign up page. He then takes some time to collect data and measure its effectiveness. Once this is over he sits down with his data and he learns, making a decision based on those results and informing the next iteration of the cycle. Round he goes again!

Our growth hacker will use his wide range of skills at each stage of the cycle but his main objective is always achieving rapid growth. The speed at which he can get round this cycle will determine how fast and how far his metrics, and ultimately the business, can grow.

Faster is Better

Driven by a need for speed, our growth hacker takes a look at each step in his engine of growth to see where he can go faster. He starts off with build. Hmm there doesn’t seem to be much he can do here. Our growth hacker’s already pretty agile on the dev front.

Learning already seems to happen pretty fast, once he’s got all his data together. But measure? Now measure seems like a place where he might be able to speed up. At the moment he’s using A/B testing. Collecting all the data he needs to make a sound statistical decision takes a long time.


What if measuring and learning could happen together? What if we could turn our 3 step process into a 2 step process and speed it up dramatically? We could change our metrics chart to look like the green line instead of the orange, allowing our growth hacker to iterate and optimise as fast as he can! Well surprise surprise, you guessed it, our hero the growth hacker has just discovered the multi armed bandit, a way to drive growth faster than ever before!


The Multi Armed Bandit (Growth Hackers’ Secret Sauce)

“Woah there,” I hear you say. “Let’s just hold on a minute here and have a bit of background into this multi-armed bandit. Where is it from and what’s it all about?” Well, like all good secret powers the multi-armed bandit started off as a problem; The Bandit Problem.

Imagine walking into a casino. You head straight for a room full of slot machines or, as they’re called in the US, one armed bandits. You’re a clever egg, so no doubt you’re thinking that some of these machines are going to pay out more than others. You want to make sure you maximise your reward by finding and playing the bandit that pays out most. This is the bandit problem.

After a long hard think, and a lot of maths, you come up with a formula that helps you to find the machine that pays out most as soon as possible. Hurray! This means you don’t waste your money trying other machines that pay less often. The formula is called a bandit algorithm, and with this in hand our growth hacker receives the mighty powers of the multi-armed bandit!

Now Back to the Story

With his trusty multi armed bandit at his side our growth hacker can now set to the task in hand. Today he’s increasing conversions on a signup page. With 3 variants of the web page to choose from the MAB shows a different page to each visitor, and its reward comes when a visitor clicks on the orange button and converts. The scores are totted up and the process is merrilly repeated each time a new visitor comes to the site.

Two key Ingredients to the Secret Sauce

So far so good, but doesn’t this sound familiar? Up to this point our MAB has been purely exploring, trying variants at random and totting up their scores which is the main feature of A/B testing. But remember the goal of a MAB is to maximise the total reward, so to do this it’s going to bring another element into the mix, exploitation, which is showing the variants that have worked best in the past. This delicate balancing act between exploration and exploitation plays out for the duration of the test as the multi armed bandit happily goes about measuring and learning, at all times working to maximise total reward.

What are My Options?

We’ve just sketched an algorithm known as E-Greedy. When it comes to MAB there are lots to chose from, such as E-Greedy, Thompson Sampling, UCB-1 or Myna. Not all will perform in the same way or deliver the same results, as you can see in the chart below: (Ooh look, Myna’s the most successful. Who would have guessed?!)

Real Life

Now I know what you’re thinking: fancy graphs based on simulated data are all well and good but I want to hear about some real life results. Look no further than one of our customers: Vizify, a startup working to create beautiful online portfolios. In order to improve their user engagement they decided to deploy Myna to optimise their email subject lines. Because Myna is so efficient with data, in just a few days Vizify had received a 500% increase in clickthroughs (pretty impressive for a startup with small amounts of traffic).

With Great Power comes Great Responsibility

When using MABs there are a few things to bear in mind:

Your workflow will change dramatically. It’s going to become simpler as Myna is going to do all the work for you. It will be faster, because its so efficient with data you’ll get results at lightening speed. It’s also way more flexible. You will wave goodbye to setting parameters in advance (experiment length and p-value), and can add and remove variants at any time, testing almost anything you set your mind to!
Defining rewards
The ideal reward measure for any A/B test is most likely customer lifetime value but you probably can’t measure this very quickly. You need to have a fairly fast feedback cycle so the algorithm can adapt in a reasonable time period. Using simple measures like conversions are fine, but with any test you should check that this correlates with your true performance metrics.
Stable Preferences
The algorithms we’ve discussed only work when users have stable preferences. We don’t mean that all users act in the same way, but rather that their behaviour is similar in aggregate and stable over time. Broadly speaking, we assume what works today will continue to work tomorrow. For UI elements this is generally the case, but it is not, for example, true for news items where the value of the story is strongly time dependant.

And that’s it. Our growth hacker’s secret sauce, the multi armed bandit has been transformed into an almighty T-Rex. Jump on its back and ride off into the sunset and do what growth hackers do best: grow, fast! RAWWWRR!

New addition to the team

The eagle-eyed among you may have noticed that we’ve got a new addition to the team. Please welcome our new mascot, Monique:

As the old site design clashed with her beautiful colouring, we threw it all away and gave the site a bit of a polish. Please excuse any empty paint cans around the place.

Monique is a Balinese myna bird, a critically endangered species. Though exact population numbers are unknown, efforts are being made to protect and strengthen the wild population, which has been severely diminished by poachers.

The Bali Myna can also be found on Indonesian currency, and you can read more on

A Myna by any other name

What does a small bird, native to South-East Asia, have to do with A/B testing?

Myna birds. Left: Common Myna, Right: Bali Myna
Left: Common Myna, photo by Yotam Orchan, Assaf Shwartz (source)
Right: Bali Myna, photo by Jcwf at nl.wikipedia (source)
Images licensed under the GNU Free Documentation License

The answer? It’s all in the algorithms.

Some of the the sub-species of the myna (or mynah) are considered talking birds, i.e., birds that can mimic human speech.

As a myna (the bird) mimics human words, and is rewarded by positive reinforcement from its owner (or delicious, delicious seeds), it learns which are the most desirable words or phrases and knows to repeat them more in future.

In much the same way, as Myna (the revolutionary new A/B testing solution) repeats your variants, and is rewarded by successful conversions, it learns which is the most successful variant and knows to repeat that more in future.

As the page views roll in on a new test, we like to picture an excitable bird, squawking away and gobbling down treats, learning not to swear in front of polite company.

Even in our short lifetime, we’ve seen multiple spellings of the word: from minah to miyna, mynor to mhyna (what can I say, sometimes I just type too fast). While both myna and mynah are considered valid spellings for the bird, the A/B testing is spelled only Myna.