Latest news about Bitcoin and all cryptocurrencies. Your daily crypto news habit.
Thread is an online personal styling service. When you sign up, their website asks you a set of introductory questions about your preferred styles, sizes, budget, owned items and so on which their personal stylists use to provide surprisingly well-tailored recommendations. Headquartered in London with around 50 staff, Threadâs Django powered e-commerce website receives approximately 850k visits a month.
Rhett: What does Threadâs site architecture look like?
Dan: Thread has been strongly powered by Django from day one. The founding team have a lot of experience with Django and so the project was architected to scale as it increased in complexity. Our recommendations service is built with Flask using scikit-learn, thereâs a lot of data science used at Thread, and weâve also got some smaller services that power state machines and quite a few monitoring services but mostly itâs a Django monolith. Monolith gets a bad wrap as a term but I think Django lends itself very nicely to dividing things up into apps and creating this nice hierarchy of logic and models.
Rhett: Why did you decide to migrate to Python 3?
Dan: Iâve been at Thread for four years now and it was fairly clear after the first year that Python 2 was going to disappear eventually. I think for a long time there was this limbo where people werenât convinced that Python 3 was the future and the community was considering running both versions long term, in a way I guess thatâs kind of whatâs been happening to date. I remember the first PyCon UK I went to in 2015 and that year a lot of people were talking about migrating their bigger codebases to Python 3. The year after that not many people were talking about it, a lot of people seemed to have just done it.
By 2017, we had very few dependencies that were compatible with Python 2 only. These were either very small or had Python 3 forks that we could move to. The Python 2 End of Life date had been confirmed (1 Jan 2020) and Django had committed to going Python 3 only too. We also run our systems on Debian and they had announced an end date for Python 2 support which was a year earlier. This was because of their release cycle, they didnât want to have a period during which they were supporting Python 2 while it was no longer officially supported by the Python team.
So I guess we could have left the migration until next year but we put a lot of effort into making sure that we have nice code, that we are writing code in good long-term maintainable ways and ultimately thatâs what Python 3 really does for Python. It makes it easier to write good code and so there were places where we were feeling the pain of not having that, particularly with things like type systems, which had to be managed with mypy, and a lot of the Unicode support which weâve never had to deal with full on because weâre only available in the UK currently. However, every now and then something would crop up and weâd get a product that had a Unicode symbol in it that we hadnât catered for somewhere or weâd ingest a CSV file from some supplier and it wouldnât work.
We have something called Tech 10% at Thread which is like Googleâs 20% time, so Wednesday afternoons is when developers can work on what they think is important. So during that time I took on the task of starting to move us to Python 3.
Rhett: How did you plan and prepare to migrate to Python 3?
Dan: Funnily enough, one of the things we didnât do as well as we could have, was plan properly. As the preparation was only happening in the Tech 10% time we didnât have the whole team working on it and it didnât get the thorough planning that any other large piece of tech work at Thread would get, thatâs something that I think weâve learnt for the future.
Having said that, we did have a plan. We started by getting our dependencies up to date so that they would work on Python 3. After that, we started to adapt our coding practices so that they would create forward-compatible code, we used quite a few Flake8 linters and other plugins for that.
For example, there is a plugin for Flake8 that allows you to ban certain imports so we went through the list of things that the six library has under six.moves so for where things have moved in a standard library you can use six.moves as a compatible way between Python 2 and Python 3. So we banned all of the Python 2 imports, moved everything over to six and that meant that we werenât importing from the wrong place so we could do that quite incrementally. A lot of those things started to come in incrementally and each week weâd do a small PR that would just change a couple of things. I think that was probably several months of work but it was only a few hours here and there and over several months we started to get to the point where the code looked a lot more like Python 3Â code.
The stage after that was getting our tests to run. Our tests didnât even run on Python 3 to begin with, not to mention pass. So when we had all the linters in place and we had fixed all of those issues the tests were a lot closer to running, then there were just a few little bugs that we could sort out to get them running. I think the first time they ran they were mostly alright, we probably had 80% of our tests passing and so that kicked off three our four weeks of using Tech 10% time to get all of the tests passing.
Most of the time it was a lot of CSV handling and stuff like that. We have to deal with several third party warehouses and shipping companies who all have FTP servers with CSV files on them, theyâve never heard of an HTTP API so that was fun. Once we fixed up the tests we found places where we needed more tests. We would find a particular bug in say one CSV system and then weâd learn from that and go and write tests for the same thing in all of our other ones just to make sure that we werenât having the same issues all over the code base.
Once the tests were passing we added another CI job to Jenkins that would run all of the tests on Python 3 and make sure that we could build the package. Then the idea was to essentially treat the Python 3 build as a first-class citizen.
We had originally expected it to be just a couple of weeks of the Python 3 build being stable before weâd start to ship it but we knew that the changeover was going to be a big risk and so we wanted to plan ahead for it. We wanted to make sure weâd do it at a time where weâd have enough developers in the office if needed, and do it at a time that wasnât a critical sales period. So we ended up putting it off for about three and a half months and keeping Python 2 and Python 3 compatibility during that time.
That was error-prone and a bit of a pain. Iâd been running Python 3 in my dev environment for several months and about a month or so after we had Python 3 compatibility all of the development team were using Python 3 in their dev environments, which then resulted in flakey builds on Python 2 as weâd write something that worked for Python 3 without realising it and then push it live and the build would fail. So that caused a bit of friction, it started to cost us time and so we put a bit more pressure on getting Python 3 shipped and eventually set the date on a Tuesday in November.
I say we didnât want the changeover to happen in a high sales period but we actually ended up doing it the week of Black Friday, which was another one of the things we learnt from this. We communicated about it really well within the tech team but we didnât communicate well with everyone else. I think we were a bit more confident about it than perhaps we should have been.
The rest of the company didnât know there was a high risk of it going wrong and so we came in one Tuesday morning and tried to push it live. It didnât work and we ended up realising by about 8 am that it wasnât going to work that day and so we rolled back. Then we came in early again the next day and tried again. A few things went wrong but less so and we ended up pushing through and just fixing issues as they came up and weâve been on Python 3Â since.
Rhett: Did you have any teething issues when migrating over the Python 3?
Dan: The main one that caused us to roll back on the Tuesday was basically an issue between our sessions and our caching system. It meant that all of our users would have been logged out and would have to log back in again. We have our sessions stored in signed cookies, and we store some of our user data in Memcached as pickled Python classes. When we moved to Python 3 there were some incompatibilities, down to Unicode issues again, that resulted in a difference between bytes in strings and it meant that we werenât able to validate the cookies that people had against the data we had stored in Memcached.
The Thread experience is mostly about browsing outfits and learning about how to dress, so itâs a much better user experience for us to have a relatively long expiry time on our cookies. We know that people dip in and out of browsing, so in terms of user experience, logging everyone out wasnât something we thought was reasonable, and thatâs why we decided not to go ahead on Tuesday and spent the day writing some forward compatibility into our caching and signing of cookies so that when we launched it the next day we wouldnât have that issue.
Part of the rollout plan was that weâd get the web servers running, weâd take everything down, put the web servers back up and the really critical queues like our checkout processing queue for example. Then the idea was that throughout the day we would bring up the other queues one at a time and watch the logs for errors. Our warehouse opens at 8 am and all the software they use is our Django app and so it had to be working by then which it was and so that was a success, but as we started to bring queues up we found there were still a fair number of things that werenât working exactly how weâd hoped they would and we ended up finding out the next day we had caused a lot of items to go out of stock, post order. Post order out of stock is when someone has bought something and then we email them to say that we actually donât have it. It usually only happens in very rare circumstances and we have to refund the customer, itâs not a good experience at all. Itâs a metric we track very closely and we try to minimise it. During this time it spiked and it turned out that some of the ways we had our queues running in a serial processes caused an issue where we were running multiple of the same queue. They were all churning through things very quickly causing issues, throwing errors and marking lots of stuff as out of stock. Our ops team probably spent about two or three days cleaning up the aftermath of that. That was probably the biggest issue we had.
We run meetings called 5 Whys when something goes wrong on a big scale to find out the root cause of the issue. We ended up running a 5 Whys for this issue and we ultimately concluded that the root cause was as a tech team we didnât communicate all of the risks and everything that might be affected well to the rest of the team. If we had done that then maybe they would have been able to spot issues sooner, maybe if weâd mentioned the risks they would have said donât do this on Black Friday or they would have seen some of the systems it was going to touch and said âactually, you might want to write some tests for this particular area because that is a very key thing for usâ.
So it wasnât a trouble-free launch but by the end of the week the only issues we were having were minor things like reports that only get generated once a week and we could very easily fix that and rerun the report. We havenât really had any problems since.
Rhett: What would you recommend to people who are preparing to undertake the same migration?
Dan: I would say plan out the steps for how youâre going to get your codebase compatible with Python 3. Invest in tooling thatâs going to help you know that itâs compatible. Write tests, if you havenât got tests on certain areas of your code base and you know that they deal with data files that might be in different formats or things that Python 3 has notably changed, Iâd say write more tests for those sorts of things. We thought we had pretty good test coverage but there were certain bits where we found we were lacking. Invest in tools like linters and set up parallel builds of your software that runs on Python 3.
Plan for how youâre going to get to Python 3 compatible in a way that doesnât interrupt the rest of your dev team. That was something that I think we did get right, by the time I went to developers and said âHey, do you want to use Python 3 on your machine?â everything mostly just worked and that meant that the rest of the team thought it was a good thing and bought into it.
I think the other thing would be to plan the launch, particularly if youâve got a web service, one where it really matters that youâre up and running and that people can transact. So plan step by step very carefully what youâre going to do and at any point within that how you can realise that itâs not working and how youâre going to roll back from that to a known working state.
Communicate outside of your engineering team, make sure that people know what all the risks are, what might stop working and what they need to be watching. Make sure that everyone knows how to identify that something isnât working because sometimes itâs difficult to know, particularly if youâve got a large system.
If youâre interested in learning more about Thread then check out Thread Engineering. Theyâre also hiring!
Lessons Learned From Migrating to Python 3 was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.
Disclaimer
The views and opinions expressed in this article are solely those of the authors and do not reflect the views of Bitcoin Insider. Every investment and trading move involves risk - this is especially true for cryptocurrencies given their volatility. We strongly advise our readers to conduct their own research when making a decision.