Google and Netflix team up to launch a new open source canary analysis tool


Google and Netflix today announced the launch of Kayenta, a new open source project that aims bring the canary analysis tools Netflix developed internally to a wider audience. Kayenta is integrated into the Netflix-incubated Spinnaker continuous delivery platform, which works across virtually every public and private cloud. While Spinnaker is the focus of this release, though, Kayenta can also be adapted to other environments.

The general idea behind canary analysis is pretty straightforward. Like the name implies, this is an early warning system that is all about prevent major issues when you roll out an update to a service or your infrastructure. As you roll out an update to a subset of new users (or servers, or parts of your network), the canary analysis service checks whether the new system behaves as it should — or at least as well as the old one. At every step, the system performs its checks and ensures that you don’t roll out an upgrade that may pass all of your regular tests but creates issues when thrown into a more complex production system.

As Google product manager Andrew Phillips told me, a lot of developers already do this, but it’s often a rather informal process. Teams often build their apps, deploy it to a few servers, wait for a few minutes and then check their dashboards to look for obvious issues. That introduces the chance of human error and brings in the potential for bias. A canary analysis system, on the other hand, can evaluate the metrics and then (ideally) make an objective decision on whether the code is ready to ship or not. While most companies run automated tests to check their code for obvious errors, that kind of testing is often not enough when you want to put your code into production, especially if that production environment consists of a set of micro services that may end up interacting with each other in unexpected ways.

As is so often the case these days, with Kayenta, the Netflix team wants to open up its own system to bring the service to the wider community (and in return benefit from the community’s advances, too). To do this, Netflix and Google also worked to rewrite the parts of Kayenta that were specific to Netflix, where the system grew rather organically. That doesn’t necessarily make for good code, though, so with Kayenta, Google and Netflix also spent some time cleaning up the code and making it more modular. Indeed, as Netflix director of delivery engineering Andy Glover told me, the Google and Netflix teams spent about a year to get the code ready for today’s release and one of the major areas of focus for both teams was making sure that the code was as modular as possible.

[Read More]