When you fail to scale across teams…

By Director of Product Management Jon Noronha

Photo by rawpixel on UnsplashThis post originally appeared on the Optimizely Blog. Optimizely, Hacker Noon’s weekly sponsor, is the world’s leader in digital experience optimization, allowing businesses to dramatically drive up the value of their digital products, commerce, and campaigns through its best in class experimentation software platform. Optimizely enables product development teams to accelerate innovation, lower the risk of new features, and drive up the return on investment from digital by up to 10X.This is the final post in our Product Experimentation Pitfalls blog series, written by Optimizely’s Director of Product Management, Jon Noronha. See here for more information on this 5-part series.

So far in this series, I’ve highlighted several pitfalls on the path to product experimentation. We’ve seen how some common mistakes can slow down your testing, stall your velocity, send your team in circles, or lead you down the wrong path entirely. But what if you already have a budding culture of experimentation and you’re seeing a steady stream of successful tests?

Congratulations! You’re in for an exciting ride. Great experiments kick off a virtuous cycle: they inspire others to question their own assumptions, and that in turn locks in a culture of hypothesis thinking that spreads through an organization. I’m often amazed by how quickly a company can go from 1 experiment a month, to 10, to 100.

Unfortunately, this progression is never perfectly smooth. As you run more tests, collect more data, and include more people in the process, a whole new set of challenges starts to appear. This final post is all about those pitfalls — the scaling challenges that can stop a growing testing culture in its tracks. To illustrate them, I’ll be sharing examples from two titans of the testing world: Booking.com and Airbnb.

Experimenting in more places

Booking is a company that’s famous for designing its entire development culture around constant experimentation. But like every testing program, they started with humble beginnings. The first version of their experimentation platform only supported running a handful of experiments at a time, and when they got started a decade ago the team couldn’t imagine the scale they’d eventually hit. Now fast forward to the present.

Overall, on a daily basis, all members of our departments run and analyse more than a thousand concurrent experiments to quickly validate new ideas. These experiments run across all our products, from mobile apps and tools used by hoteliers to customer service phone lines and internal systems. Experimentation has become so ingrained in Booking.com culture that every change, from entire redesigns and infrastructure changes to bug fixes, is wrapped in an experiment…Such democratization is only possible if running experiments is cheap, safe and easy enough that anyone can go ahead with testing new ideas, which in turn means that the experiment infrastructure must be generic, flexible and extensible enough to support all current and future use cases.

Through all this exponential growth, Booking has had to continuously revamp its testing infrastructure to support new kinds of experiments. At each stage of growth, new teams brought new touchpoints and new challenges. From website conversion optimization, they expanded to native mobile testing, and from there to experiments deep in the backend technology stack. Today, Booking has a team of over 40 developers and statisticians continuously working on improving their testing platform.

At Optimizely, we’ve taken our own version of this journey. Like Booking, we’ve devoted 8+ years and tens of millions of dollars in R&D to expand our “experimentation footprint.” From simple website testing, we’ve followed the lead of the savviest teams to add native mobile experimentation and server-side testingacross 10 different languages. Along the way, we’ve had to work through maddeningly subtle problems. Experiments have to work everywhere, but they also have to be rigorous, performant, secure, compliant, consistent, and reliable. If any one of those things goes wrong, in any part of the stack, it can instantly undermine years of trust built up in a testing culture.

If there’s one lesson I can distill from that journey, it’s this: don’t choose your testing platform based on the first experiment you want to run, or even the tenth. Think about where your hundredth or thousandth test might run — and make sure that you’re building or buying technology that can scale far beyond it.

Collecting more data

Scaling experimentation doesn’t just mean supporting new use cases: it also means gathering more data. A lot more data. Enough to take your cleverly designed analytics pipeline and blow it up, over and over again. I’ve lived through this particular challenge three or four times now, so in a strange way, it was comforting to read about Airbnb’s long saga of scaling their Experiment Reporting Framework (ERF):

The number of concurrent experiments running in ERF has grown from a few dozen (in 2014) to about 500…More impressively, the number of metrics computed per day has grown exponentially…Today we compute ~2500 distinct metrics per day and roughly 50k distinct experiment/metric combinations.

Publication date

07/16/2018 - 19:19

Author

Hackernoon

Article source

When you fail to scale across teams…

Experimenting in more places

Collecting more data

Tags

Disclaimer