The Metrics: Test Failure Rate
Last time we’ve talked about counting the number of tests. That’s a pretty good indicator of how the team is adapting the new practice of writing tests. But how do they treat those tests? Do they see tests as a helpful technique to their code quality? Or do they write tests because they follow the scrum master’s orders?
(If there’s a master, there are probably slaves, right? Never mind).
So let’s take a look at our next metric: Rate of failing tests.
Should I track it?
Oh yeah. All the time. Its meaning changes over time, though.
Why should I track it?
There will be failures. That’s why we write tests – to warn us if we’ve broken something. But then we review those failures and discuss them, and we expect results – the overall test failure rate should drop (to zero if possible), and existing test failures to be fixed quickly. Sometimes the review of the failures may lead to have an affect on the rate – it could be that flaky tests should be handled differently. But overall, the metric tells us about attitude.
What does the metric mean?
Remember, we’re counting those failures in the our CI system. We expect very few test failures, because we expect developers to push tested code. The tests should have run on their machine prior to pushing.
Always working tests in CI means the code quality is respected. Seeing the failure rate go down shows adoption.
To take it further: it’s about respect to other team members and their time. When the test failure rate goes down, the build doesn’t break so much. People don’t waste time to get it fixed, or waiting for it to be fixed.
In the long run, after the team has adopted automated tests, if the rate doesn’t drop to zero, it’s a sign of the codebase’s quality – the team doesn’t take care of consistent failing or flaky tests. You can then make decisions – remove tests, improve the code, or do something completely different. Your choice.
How to track it?
Easy, any CI tool will show you the number of failures for every build. We want the number to be as close to zero averaged over time. Put a graph on the wall and review at the retrospective.
Is it comparable?
Nope. This is a team specific metric. Different people, skills, experience and codebase make numbers different for different teams. There’s no sense comparing failure rates.
Want help with visualizing and measuring the quality of your code and tests? Contact me!