Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
So far in this series, I’ve highlighted several pitfalls on the path to product experimentation. We’ve seen how some common mistakes can slow down your testing, stall your velocity, send your team in circles, or lead you down the wrong path entirely. But what if you already have a budding culture of experimentation and you’re seeing a steady stream of successful tests?
Congratulations! You’re in for an exciting ride. Great experiments kick off a virtuous cycle: they inspire others to question their own assumptions, and that in turn locks in a culture of hypothesis thinking that spreads through an organization. I’m often amazed by how quickly a company can go from 1 experiment a month, to 10, to 100.
Unfortunately, this progression is never perfectly smooth. As you run more tests, collect more data, and include more people in the process, a whole new set of challenges starts to appear. This final post is all about those pitfalls — the scaling challenges that can stop a growing testing culture in its tracks. To illustrate them, I’ll be sharing examples from two titans of the testing world: Booking.com and Airbnb.
Experimenting in more places
Booking is a company that’s famous for designing its entire development culture around constant experimentation. But like every testing program, they started with humble beginnings. The first version of their experimentation platform only supported running a handful of experiments at a time, and when they got started a decade ago the team couldn’t imagine the scale they’d eventually hit. Now fast forward to the present.
Overall, on a daily basis, all members of our departments run and analyse more than a thousand concurrent experiments to quickly validate new ideas. These experiments run across all our products, from mobile apps and tools used by hoteliers to customer service phone lines and internal systems. Experimentation has become so ingrained in Booking.com culture that every change, from entire redesigns and infrastructure changes to bug fixes, is wrapped in an experiment…Such democratization is only possible if running experiments is cheap, safe and easy enough that anyone can go ahead with testing new ideas, which in turn means that the experiment infrastructure must be generic, flexible and extensible enough to support all current and future use cases.
Through all this exponential growth, Booking has had to continuously revamp its testing infrastructure to support new kinds of experiments. At each stage of growth, new teams brought new touchpoints and new challenges. From website conversion optimization, they expanded to native mobile testing, and from there to experiments deep in the backend technology stack. Today, Booking has a team of over 40 developers and statisticians continuously working on improving their testing platform.
At Optimizely, we’ve taken our own version of this journey. Like Booking, we’ve devoted 8+ years and tens of millions of dollars in R&D to expand our “experimentation footprint.” From simple website testing, we’ve followed the lead of the savviest teams to add native mobile experimentation and server-side testingacross 10 different languages. Along the way, we’ve had to work through maddeningly subtle problems. Experiments have to work everywhere, but they also have to be rigorous, performant, secure, compliant, consistent, and reliable. If any one of those things goes wrong, in any part of the stack, it can instantly undermine years of trust built up in a testing culture.
If there’s one lesson I can distill from that journey, it’s this: don’t choose your testing platform based on the first experiment you want to run, or even the tenth. Think about where your hundredth or thousandth test might run — and make sure that you’re building or buying technology that can scale far beyond it.
Collecting more data
Scaling experimentation doesn’t just mean supporting new use cases: it also means gathering more data. A lot more data. Enough to take your cleverly designed analytics pipeline and blow it up, over and over again. I’ve lived through this particular challenge three or four times now, so in a strange way, it was comforting to read about Airbnb’s long saga of scaling their Experiment Reporting Framework (ERF):
The number of concurrent experiments running in ERF has grown from a few dozen (in 2014) to about 500…More impressively, the number of metrics computed per day has grown exponentially…Today we compute ~2500 distinct metrics per day and roughly 50k distinct experiment/metric combinations.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.