Psychological Safety is Fundamental to DevOps

Roberto Javier Yudice Monico
3 min readSep 30, 2020
Photo by Pop & Zebra on Unsplash

When working with companies I have encountered many situations in which they want to implement continuous delivery/deployments, but they don’t have the right practices in place to provide psychological safety to developers. This results in continuous delivery implementations that produce mediocre results at best because developers are afraid of breaking things.

As many of you may know, continuous delivery means that every commit is ready to be deployed to production at any time, and continuous deployments means that every commit is deployed to production automatically. When you don’t have the right practices in place your developers will be in fight or flight mode all the time, afraid of breaking production, and instead of increasing productivity you will hinder it, because you won't have a safe environment. In [The Devops Handbook](https://www.amazon.com/DevOps-Handbook-World-Class-Reliability-Organizations-ebook/dp/B01M9ASFQ3) it is mentioned how important safety is to productivity, it cites the case of Alcoa, an aluminum plant whose CEO put safety above everything else and productivity was significantly increased, and as consequence other key metrics were also improved.

So how can you provide psychological safety to your developers in a continuous delivery/continuous deployments environment? Some devops practices cover this however I also found out that they are very often not properly implemented. I discuss a few below.

Better Monitoring

So a lot of companies think that just centralizing logs using ELK or other solution is enough for monitoring, and it might be in a classic agile environment but if you are doing continuous deployments it’s not going to be enough, You will rely on developers going into Kibana or your preferred log visualization tool to check if they didn’t break anything in production, this will interrupt their FLOW.

A typical workflow I suggest is to send those logs to your preferred chat tool, either Slack or Teams, but only Error logs. This is important because if you send any other kind of log then people will eventually stop putting attention to the channel, and we don’t want that.

By doing this when a deployment happens developers can feel safe that if they haven’t seen any notifications in the channel they didn’t break anything (yet).

Traffic Mirroring

Traffic mirroring is easier to do in this service mesh world. If you are using Istio (or any other service mesh) you can easily mirror production traffic to your dev environments, assuming you are still using dev environments. This way your developers can deploy to you dev environment and monitor how their commit behaves with production traffic before actually promoting it to production. This assumes you are still in the continuous delivery phase and not doing continuous deployments.

Canary deployments

Blue/green deployments are dead. In a continuous deployment environment you need canary deployments and incremental rollouts. The tools that we have at our disposal now makes this significantly easier than before.

You will also need an automatic rollback mechanism, by hooking the canary deployment tool into your metrics store, that being prometheus or influxdb, so that you can rollback automatically and interrupt the rollout if, for example, the error rate increased in prometheus.

Investing more in integrations tests

Unit tests are not enough when doing continuous delivery. Your commits have to be ready for deployment and just testing your changes in isolation is not an indication that you are not going to break something else. I would say that integration tests are even more important than unit tests.

A lot of companies have a lot of unit tests but very few integration tests, this has to change if you want your developers to feel safe that they did not break another component of your system.

Making You CI Pipeline Visible

You want your developers to be receiving constant automated feedback on their changes so that they can maintain flow. Any feedback that requires the developer to stop what they are doing will break flow.

You may have a mature CI pipeline but if it requires the developer to login to your CI frontend and check if the build was successful it’s not enough. You have to push the feedback to your developers automatically and not the developer having to pull the feedback from your CI.

Wrap up

These are just a few of many ways that you can provide psychological safety to developers. Technology is changing all the time and I’m sure there are other and perhaps better ways.

--

--