Pokemon Go and the art of customer satisfaction

Matt MacDonald-Wallace, DevOps Engineer, DevOpsGuys, explains what DevOps lessons can be learned from the hit mobile game.

Everyone has been surprised by the recent success of Pokemon Go, and that seems to include its creators!

‘Going viral’ is not always easy to predict (even when it might be your goal) but your business can take advantage of automation within the cloud to ensure that your website or online service can deal with the demands of rapid growth in visitors or users of your app.

It has long been held within the IT industry that Google and it’s subsidiaries know how to scale – after all, they generate at least 40% of the traffic on the internet and run GMail, Google Search and Google Docs which are relied on the world over for mission-critical IT services by companies such as Roche Pharmaceuticals and BBVA Financial Services, so why has the launch by Niantic (the company running the platform that powers Pokemon Go and is owned by Google) been such a disaster when it comes to running at scale?

Shortly after launch, Pokemon Servers went offline for well over five hours world-wide leading to outcry on social media, and, whilst there are claims that it was a hacker group taking the service offline via a Distributed Denial of Service Attack (DDOS), we find it hard to believe that it had nothing whatsoever with the decision to launch in another 26 countries some six hours previously.

If your infrastructure is in a traditional data-centre that you own and manage, or runs on hardware that you co-locate in someone else’s data-center, then it’s more than likely that you have problems scaling in the event of the Slashdot Effect or a major sales campaign that is successful.

The primary reason for this is that even if you are running a virtualisation platform such as VMWare on your infrastructure, eventually you will saturate your existing hardware and need to place a frantic call your hardware supplier in the hope that those 10 or 20-day lead-times can be cut drastically!

If you migrate out to a cloud provider such as AWS or Microsoft Azure, then the following three tricks could help you ensure that you deal with even the most severe amounts of traffic:

Use containers for the stateless parts of your application

If you have parts of your website or web service that are stateless (i.e. don’t rely on shared storage and follow the ‘cattle not pets’ philosophy), then you can migrate them out to containers that can be spawned and deleted as required and hosted in a ‘container services’ environment such as AWS or Azure Container Services. This will allow you to configure how many containers are running at any given time, and how your infrastructure should react to sudden spikes in traffic (helpful hint – spawn more containers whilst busy, delete containers when quiet).

Learn how to take advantage of auto-scaling at your provider

Both Azure and AWS provide a method of auto-scaling virtual instances for those parts of the infrastructure that cannot be turned into containers for whatever reason. Learn how to use these tools and couple them with Azure ARM Templates or AWS Cloud Formation to deploy and scale multiple instances when you have an issue with traffic (Pete Mounce’s talk at WinOps 2015 is well worth a look to see how Just Eat do this!).

Get your developers and operations engineers talking to each other (aka DevOps)

We talk to a lot of people who seem to be tired of hearing how much DevOps is about culture and not about tooling, and yet through encouraging developers and operations engineers to talk to each other, you can start to find some incredibly creative ways to enable an application to scale and an excitement in pushing the envelope.

At this years’ WinOps, Pete Mounce (again!) talked about DDoS’ing yourself every night in production to prove that you can handle the traffic that your customers might throw at you, we think it’s worth a try and we have seen engineering teams rise to the challenge that is presented to them with excellent results.

Hopefully this blog post has given you some helpful pointers on how to make use of cloud technologies to avoid the issues that Pokemon GO have been plagued by.

Edited for web by Cecilia Rehn.