How to avoid breaking production — Part 1
When a bug happens, how fast can it be fixed? As engineers, one of our biggest concerns is keeping the product available and reliable at all times. However, we also need to keep releasing new features without affecting the customers’ experience. In 2022, we can not simply rely on automated unit testing as quality assurance. That is just not enough. There is a combination of techniques that can be used to make this happen, and that is what we will dig into today.
This article is authored by Lucas Tagliani and Thayse Onofrio, since we recently paired in a tech lead role.
First, let’s share some context:
We’ve been working for the last 2.5 years as developers and/or as tech leads.
Our current tech stack is mainly React and Typescript on a Micro Frontend Architecture, deploying software to production using Jenkins almost every day.
Okay, enough context for now. Let’s start digging into the steps we go through and the tools we use to deliver software, frequently and with confidence:
1. Static code analysis
It helps you to avoid the most obvious bugs, code smells and prevents human typos. It also keeps your code consistent across the source code and shows you where you have more code complexity, so you can act before it gets too hard to maintain.
The tools we currently use are eslint for almost all file extensions, Typescript with .ts and .tsx files, and Sonar for the majority of the files as well.
Usually, after you set up a couple of rules, you are ready to take advantage of it in your pipeline — or even earlier, in your pre-commit hook, which is exactly what we do for type-checking and eslint verification. We actually prevent ourselves from committing if there is any issue there. Then, Sonar scans our code as soon as it gets to the main branch on GitHub and notifies us of any issues as well.
2. Unit and integration tests
It is not about how much % we cover, but what pieces of the application we decided to cover — and we actually try to cover every single piece of it. This is why we have over 90% of code coverage in our unit and integration tests. By default, the only files we don’t test this way are the configuration files.
Here is where we test not only the happy path, but also all edge cases we can think of. These tests are usually cheap, they don’t take much time, nor that many resources. This is why we can run almost 3 thousand tests in less than 2 minutes, and we do it locally before each push to any branch in our repository — we have acceptable reasons to use local resources instead of running it in our Pull Requests pipeline. It might be quite specific for our case, but we can talk about it later.
Jest and react-testing-library are the main testing tools we use in this level of testing.
We’ve also been using Test Driven Development (TDD) as much as possible, and it helps us to only write real code once we already have a test covering that scenario. We can’t say how much we enjoy TDD, and we can’t see a better way to get faster feedback about our code with all the benefits of a simpler design that comes together.
3. User journey tests
Even though we make sure to cover our bases using the unit and integration tests, there are scenarios that are more complex and that we’ll only be able to reproduce and validate by actually running the application. For those, we run user journey tests using Cypress.
Cypress allows us to click buttons, input text, hover elements, press keys, and do several actions as the end user would. This is very similar to what react-testing-library provides us, but Cypress does it while running the real application instead of just rendering a piece of it.
We don’t want to create the exact same tests we have in the previous layer, so we choose the main features across our application, and we usually test their happy path. These tests take time and consume a lot of resources, so it’s important to keep that in mind before adding a lot of them into this layer, otherwise, we would have a pipeline that takes a long time to run and to give us feedback.
Choose your user journey tests wisely. In our context, a test called “Modal should be closed when user clicks on the close button” is a good example of a test in this layer. We also test it in different viewports (Mobile and Desktop, for example) to make sure we keep consistency in both.
Our main goal is to get feedback as soon as possible, so we focus on having a lot of unit tests covering different possibilities and leave the user journey tests for only what’s needed. Every time we open a pull request to our main branch, we start a docker container with our application inside of it, and we run our user journey tests against our app. This is how we make sure that all our major scenarios continue working after any change.
4. End-to-end tests
Given that our application is a micro frontend, even when we execute user journey tests running our real application, we are just covering a piece of it compared to the whole website. Our end-to-end (E2E) tests are there to cover scenarios across applications. A good example of it is “User should be able to search, select, pay and get an estimated time of arrival of a product”.
In our case, there is a separate team that owns this layer of tests and instead of running inside each one of the application’s pipeline, it runs periodically every day.
These tests are hard to maintain because you need to have at least some understanding of each one of the applications involved and make sure the contract (usually HTML selectors in our frontend environment) is still there even after big changes are made individually. Specific and open channels could be a good way to keep clear and effective communication between the different teams.
5. Visual Regression Tests
Before we get into details on how we use it, let’s explain the basis, as Visual Regression tests might not be that common. Visual Regression tests can catch UI bugs by taking screenshots of your component and comparing it to an expected version of the same component. The first time you configure it, it will take screenshots of your components and call it a “baseline”, which is considered the correct version of your screenshots. From that moment on, any other screenshot will be compared to the baseline. If there are differences, depending on the level of comparison you chose, it will fail the checkpoint and mark it as unresolved, which might break your pipeline as well. That is what we want, as it will prevent you from adding unexpected changes to your code. If the change is expected, then you can simply approve it and run one more time. If it is not expected, you might want to fix it before pushing that again.
Part of our application’s goal is to provide components to be used by other teams, which means documentation, consistency and compatibility are very important for us. We’ve been providing it using two main tools:
- Storybook: where we demonstrate all our exportable components (called stories) in a way that other teams can try it out and easily change it. We also provide a good description of how to use it and specific corner cases.
- Applitools: this package allows us to take screenshots of each of our stories individually then it compares with our baseline and if there is any difference like color, margin, font-size or anything else, it will let you know.
We currently take more than 50 screenshots for desktop stories and more than 40 screenshots for mobile stories in every single pull request we open. It helped us a lot to avoid introducing unintended UI changes to our users that we would probably never catch by the human eye.
If you would like to just try this tool out it is free for 1 user and a couple of checkpoints monthly, otherwise, you will need to pay for it.
You could also use the regular snapshots approach as an option since it keeps track of your component’s style as well. We had it before, but we noticed that it was too easy to update snapshots by mistake, that is why we moved to a more robust solution.
Getting it all together
Now that you know all the practices we use before production deployments, we summarized it all on a table, so you can see in which layer we run each step, and how long it takes.
NOTE: the last row takes into consideration not only the time spent in quality layers, but also other actions like building, installing, and publishing.
In the second part of the article, we’ll take a look at other practices that help us to ensure production is up and running during and post-deployment. Some of those practices include the frequency of deploys, deployment strategies such as blue/green or canary releases, feature flags, a/b tests, observability and on-call model.
What is the path to production in your current project? What are the differences? Which layers do you have that we don’t?