|This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
|Leading Indicators I
|Leading Indicators II
|Leading Indicators III
Now that we’ve talked about what we want to achieve, we better spread out some sensors. After all, we’re about to embark on a long and winding road. We want to know if we’re still going the right way, and if not, make a turn.
Leading indicators raise the red flag before the worst has happened. In our case, we don’t want to check in after six month and see no one’s written any tests for the last three months. We’d like to know sooner.
So what should we track?
If you’re thinking “coverage”, hold your horses. Coverage by itself is not a good progression metric. We sometimes mix simplicity of measurement with effectiveness, and coverage is one of those things.
Coverage as a number doesn’t have context – are we covering the right/risky/complex things? Are we exercising code but not asserting (my favorite gaming process)? Are we testing auto-generated code? Without the context, the coverage number has no applicable meaning.
Before we mandate tracking the entire software coverage, or setting a coverage goal, remember that while it is easy to track exercised code lines, the number doesn’t mean anything by itself.
So what do we look for?
The first metric to start tracking is the simplest one: Number of tests. (Actually two: tests written and test running).
Our first indicator if people are writing tests, is to count the tests. For that, we need to require a convention of test location, naming, etc. Guess what? These are needed also to run them as part of a CI build. Once everything is set up, we can count the tests where they count (pun!).
Then we want to look at the trend over time. The number should go up, as people add more tests. If the trend flatlines, we need to investigate.
One thing about this metric – if you never see a drop in the number of tests, something’s wrong. That probably means tests just stay there, and are not getting re-reviewed, replaced or deleted. In the short term, starting out, we want to see an upward trend. But over the long haul, code changes, and so do the tests. We want to see at least some fluctuations.
So what about that coverage?
Ok, we can measure coverage. We get that out of the box, right?
But we need to know what we’re measuring. Coverage means executed lines of code. So we can look at coverage (or lack of) as an indicator in the context we care for.
That could be any of the following:
- Important flows in the system
- Buggy code
- Code we return too over and over
- New component we want to cover
Or any other interesting bit. When we measure coverage, we want to see a trend of increasing coverage over these areas.
Now, how do you manage that? That requires more tricks, since we want to make sure we measure the right code, and the right tests. If the code architecture already supports it, it’s easy: 75% of a library for example.
If, however you want to measure coverage of a set of classes, excluding the other parts of the library, that requires more handling and management. Usually people don’t go there.
The funny thing is, the more specific you want to get, the less regular tools stop helping. And the broader numbers lose meaning.
By the way, the coverage metric should go away once you get to sufficient coverage. Once again, it’s not a number, but what we want to achieve – stability over time, or regression coverage. We can stop measuring then (and may be look at other areas).
Ok, we’ll continue the indicators discussion in the next post.