This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

Now that we’ve talked about what we want to achieve, we better spread out some sensors. After all, we’re about to embark on a long and winding road. We want to know if we’re still going the right way, and if not, make a turn.

Leading indicators raise the red flag before the worst has happened. In our case, we don’t want to check in after six month and see no one’s written any tests for the last three months. We’d like to know sooner.

So what should we track?

If you’re thinking “coverage”, hold your horses. Coverage by itself is not a good progression metric. We sometimes mix simplicity of measurement with effectiveness, and coverage is one of those things.

Coverage as a number doesn’t have context – are we covering the right/risky/complex things? Are we exercising code but not asserting (my favorite gaming process)? Are we testing auto-generated code? Without the context, the coverage number has no applicable meaning.

Before we mandate tracking the entire software coverage, or setting a coverage goal, remember that while it is easy to track exercised code lines, the number doesn’t mean anything by itself.

So what do we look for?

The first metric to start tracking is the simplest one: Number of tests. (Actually two: tests written and test running).

Our first indicator if people are writing tests, is to count the tests. For that, we need to require a convention of test location, naming, etc. Guess what? These are needed also to run them as part of a CI build. Once everything is set up, we can count the tests where they count (pun!).

Then we want to look at the trend over time. The number should go up, as people add more tests. If the trend flatlines, we need to investigate.

One thing about this metric – if you never see a drop in the number of tests, something’s wrong. That probably means tests just stay there, and are not getting re-reviewed, replaced or deleted. In the short term, starting out, we want to see an upward trend. But over the long haul, code changes, and so do the tests. We want to see at least some fluctuations.

So what about that coverage?

Ok, we can measure coverage. We get that out of the box, right?

But we need to know what we’re measuring. Coverage means executed lines of code. So we can look at coverage (or lack of) as an indicator in the context we care for.

That could be any of the following:

  • Important flows in the system
  • Buggy code
  • Code we return too over and over
  • New component we want to cover

Or any other interesting bit. When we measure coverage, we want to see a trend of increasing coverage over these areas.

Now, how do you manage that? That requires more tricks, since we want to make sure we measure the right code, and the right tests. If the code architecture already supports it, it’s easy: 75% of a library for example.

If, however you want to measure coverage of a set of classes, excluding the other parts of the library, that requires more handling and management. Usually people don’t go there.

The funny thing is, the more specific you want to get, the less regular tools stop helping. And the broader numbers lose meaning.

By the way, the coverage metric should go away once you get to sufficient coverage. Once again, it’s not a number, but what we want to achieve – stability over time, or regression coverage. We can stop measuring then (and may be look at other areas).

Ok, we’ll continue the indicators discussion in the next post.

 

 


1 Comment

David V. Corbin · June 1, 2017 at 6:32 pm

Many good points. My favorite observations is: “Metrics are evil – but a necessary evil”

A big warning on Coverage – even 100% can easily leave major gaps. One needs to consider Cyclomatic Complexity (number of paths through the code) and perform metrics against that.

Finding the places for effective investment in testing (as you point out) is key. My single biggest recommendation is to establish a practice where every deficiency/defect/bug in code is first addressed by a failing test, and then made into a passing test. Track these as “reactive” tests. Do this for 3-5 sprints [about 10 weeks] and then look at the distribution – you WILL learn a lot about your system.

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *