Unit Testing Implementation: The Plan

Standard
This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

So far, we’ve talked about the process itself, our goals and expectations, what to look for while we’re moving forward, and now it’s time we get to the good stuff.

How does an implementation plan actually look like? A good plan includes these elements:

Training

Remember that when we start, we already have a core team, usually one, who learned the ropes all by themselves. While they can be great ambassadors or mentors, they are usually not trainers. They know what they’ve encountered, and that is usually much less than skilled practitioners and trainers, who’ve seen lots of code and tests.

The other teams, the people who start from scratch, need context, focus and the quickest ramp up in order to get started. In my introductory courses, I introduce tools as well as effective practices of testing – planning, writing, maintaining, working with legacy code, etc. In addition, I expose them to design for testability and TDD. The courses are hands on, so people can practice the different topics.

Environment preparation

Apart from having the tools available on the developer machines, we need a CI server that’s configured to run the tests and report the test run results. We’d also like to have project templates (maven archetypes, makefiles, etc.) available so people won’t need to start from scratch.

All dependencies (libraries, tools, templates, examples) should be available in a central repository. On day one we want people to start committing tests that are run and reported. We don’t want to have them bump into environmental problems and extinguish their motivation.

Coaching

This are sessions (1-2 hours each tops) when an experienced coach (either external or internal), sits with one or two people and helps them plan test cases, write tests and review tests for things they are working on. This way we transfer the knowledge of testing, as well as starting to create conventions of “this is how we test”. We focus on code that’s being worked on, focusing on making it testable and proving it.

If you start out with an external coach, it would scale up to a point. The idea is to start with a small group that can later become the mentors for new people in a viral way. The ambassadors from the pilot stage can and should support that process.

Communities of practice

We want to continuously improve the way we test, discuss and share our experiences. As we’ve already discussed, there should be forums for discussing and practicing testing. That means we need scheduled time, when people are encouraged to attend, talk about what they did, and learn from others. Test reviews, refactoring together, learning patterns – these meetings breed stronger developers.

These COP meetings are opportunities to discuss the metrics and goals, adapt if help is needed. They are engines for learning and imprvement. They also send the message from management that testing is important. As time goes by, and less coaching sessions are needed, the COP takes over as the main teaching and mentoring tool.

There you have it. In the next posts, I’ll go through a case study of deploying a unit testing plan.

 

 

 

Leadership in Unit Testing Implementation, Part II

Standard
This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

We talked about management attention and support, and there’s more leaders can do, in order to help us make the process work.

Remember those leading indicators? They don’t collect themselves. If we think about those indicators as a feature, there are customers waiting for them.. Management and leaders are those customers.

Turns out the people who care about the metrics are the ones who have the power to facilitate their collection, demand the reports, analyze the patterns and ask for correction plans. Who knew?

The funny thing is the collection of the metrics is a leading indicator by itself. If the process of metric collection, anlysis and feedback is done, chances are the process will go well, since somebody’s supporting it. If it doesn’t (for any combination of reasons), people see that the process is “not that important to managers”, which quickly translates to “it’s not that important to us, and me”. Even if people care about unit testing, they care more about making sure they don’t get caught on other things that managemnet does care about. Safety first.

Embracing change

Even when the process does go well, an anti-pattern may appear: Sticking to the initial plan, rather than changing course over time, which we expect leaders to do.

An example of that is around our favorite metric:  Management asserts that coverage should increase over time. Since our leaders are wise, they don’t state a minimum coverage threshold, just an indication that coverage is growing. And, we’re not covering old code, just new code. Looks innocent enough.

What happens next will surprise you (not). To keep the coverage increasing, people are encouraged to add tests to their code. But since our developers are also wise, they don’t add code that unit tests don’t make sense for (like data transformation, or auto-generated code).

So the coverage doesn’t rise. Alas, if that’s most of their code they don’t get “coverage points”, or in some measurement systems, lose some of them. Remember safety first? Let the gaming begin. The developer may add tests that are ineffective (or even harmful), just to satisfy the metric.

Overwatch

The only way to make sure this doesn’t happen is through retrospection, analysis and feedback. I’ve already said that as stakeholders, leadership can make sure these take place. But should they do all that by themselves?

Here comes the next way how leadership helps the process: creating forums for the learners to learn further from peers and grow internal experts. We call those communities of practice. Where practitioners discuss, review, analyze and get feedback to learn from the success and mistakes of others.

The chance of these forums created, and continuing to take place over time, is likely to succeed with management support. In these forums, the discussion takes two forms: the tactical level – how to write better tests, learn refactoring methods, etc. But also at the strategic level, look at the process itself, the metrics, suggest course correction, and follow up with the implementation.

Now, the more authority we give the communities, the better. We like self-organizing and self managing teams. They will need leadership to be created, exist and help when they need external resources and efforts.

Combine all these leadership support methods, and we’re on our way to a successful implementation.

Leadership in Unit Testing Implementation, Part I

Standard
This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

Any process we start to roll out, requires management support. If we want it to succeed, anyway.

Inside teams, if the team leader opposes the new process, she will either actively, or secretly, work against it. If she’s for it, she’ll mandate it. When the team is independent enough, and can make their own decisions, the team leader will approve of the team making their own decisions, and facilitate the team’s success.

When we’re talking about multiple teams, and cross organization processes, it’s not even a question. Not only do we need to make sure the new processes take hold, sometimes we need to make more resources available if they are not there to begin with.

Think about an organization moving from one team writing tests to multiple teams. We need to support all of them on the IT level (enough resources and environments to run the tests), at the branch management level (who works on which branch and when the changes move to the trunk), at the automation level (optimizing build performance), and coordination level (what happens when the build goes red).

To make a (very) long story short, it takes management time, attention, and a lot of pushing (nicely) to allow the process to take effect.

Oh, and there’s one more thing leaders need: Patience.

Regardless of how simple the process is (and unit testing, is definitely not simple), patience is a prerequisite. Any process implementation takes time, and we usually see the fruit of our labor down the line. Add to that the learning curve is steep, and the recipe for impatience is complete.

The learning process in unit testing seem short – if you focus on learning the tools. But until people start writing regularly, effectively, and see the benefits it takes weeks, usually a lot more.

If there are constraints and conflicts, it will take even more than that. Consider a team working inside a legacy code swamp, and a closed architecture they aren’t allowed to change. Their ability to change the code is constrained, and therefore the ability to write tests is constrained. That means less tests written, and often less effective tests at that (you test what you can, regardless of importance).

Expecting coverage to rise under these conditions, and even more so, the number of effective tests to increase is bound to crash into the realities. With failed expectations, come disapointment, maybe an angered response, losing faith in the capabilities of the developers, and often calling it quits, way too early.

We’ll continue to discuss what else leaders can do in the next post.

 

TDD: Mind Your Language

Standard

One of the exercises I love to do in my TDD classes is to build a lightsaber in TDD. (Yes, of course that’s how they’re made).

In the exercise, I go through listing all kinds of features and use cases, and the first test we usually write is for turning the lightsaber on. Most times it looks like this:

It’s got a weird-ish name, and it’s probably not the first test I’d write (I’d do the off status after creation). But that’s not my peeve.

If you’re coming from a development background, like me, a getStatus method on a lightsaber seems perfectly ok. After all, how else would you check if the lightsaber is really on?

There are two issues with this. The smaller one is the premature generalization: Using an enum as a result. Sure, if you’re using a getStatus method, you want to return a meaningful value. Yet, if you TDD like a pro, the tests should tell you when to generalize, like after you have a couple of return values.

But I’ll let that one slide, there’s a bigger issue here.

Talk this way

Did you ever hear a Jedi master, or a Sith lord ask: “What status is that lightsaber? I need to get it”.

No. They don’t talk like that.
Regular humans don’t talk like that.

The only people who talk like that are programmers.

Chances are, if you’re a programmer and you’re still reading, you probably still don’t see the problem. Let me spell it out: We’re coding using terms that are different than the ones used in the business domain.

This results in maintaining two models, the business and the code (and sometimes the test model), all either using different terms, or not carrying the exact meaning. So we need translations. Translations bring mistakes with them (read: bugs).

As time goes by, and we add more code, the two models diverge. Making changes (say, a new requirement) needs more effort, and is more risky. We need to continuously update the business model, and re-translate into the code model. That is hard and error prone.

If we don’t put effort into it, the difference between the models grow. We breed complexity by using two languages, and we pay in a lot more effort.

What’s a better way? Well maybe something like this:

Now that’s Jedi talk. You can already feel the Force flowing through it.

Leading Indicators in Unit Testing Implementation, Part III

Standard
This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

in the last post we talked about the failing builds trend as an indicator of success of implementation.

The final metric we’ll look at, that can indicate how our process will go, is also related to broken builds. It is the “time until broken builds are fixed” or TUBBF. Ok, I made up this acronym. But you can use it anyway.

If we want to make sure the process is implemented effectively, knowing that builds are broken is not enough. When builds break, they need to be fixed.

No surprise there.

Remember that the long-term goal is to have working code, and for that we need people to be attentive, responsive and fixing broken builds quickly. Tracking the TUBBF, can help us achieve that goal. We can infer how people understand the importance of working code, by looking at how they treat broken builds.

Sharing is caring

One of eXtreme Programming’s principles is shared code ownership, where no single person was the caretaker of a single piece of code. When our process succeeds, we want to see the corollary – everyone is responsible to every piece of code.

With small teams it’s easier to achieve. Alas, with scale it becomes harder. Usually teams specialize in bits of code, and conjure the old demon of ownership. With ownership comes blame and the traditional passing of the buck.

After all, our CI log says it right there: They broke the build, by committing their code that doesn’t have any resemblence or relation to our code. We can’t and won’t fix it. They broke it. They should fix it.

Then comes the next logical conclusion: If we didn’t break the code, we can continue אם safely commit the code. After all, we know our code works, we wrote it.

And so, every team blames the other team, committing unchecked changes and the build remains red.

(by the way, maybe they commited the last bit that broke the build, but that doesn’t mean their changes were at fault. If a build takes a long time, usually changes are collected until it starts, and it only flags the last commit, although that last one may be innocent).

Everybody’s Mr. Fix-It

One of the drastic measures we can do, is to lock the SCM system when the build breaks. That’ll teach them collective ownership.

But that doesn’t always work. People just continue to work on local copies, believing that somebody else is working relentlessly, even as we speak, on fixing the build.

Another option is to put the training wheels on. Train a team about keeping build green without interference from other teams, by developing on team-owned branches. We track the team’s behavior on their branch, encouraging them to fix the build. They are responsible to keep the build on their own branch green. Only when branch builds are stable and green, it’s ok to merge them to trunk.

The worst option, and I’ve seen it many times, is having someone else be the bad cop.

Imagine an IT/DevOps/CI master that starts each day checking all the bad news from the night, tracking the culprit, and making them, but mostly begging them, to make amends. Apart from not making the team responsible for their code, it doesn’t stop others from committing, because of the malfunctioning process.

As long as we can track the TUBBF is some manner, we can redirect the programmers’ behavior toward a stable build, and teach the responsibility of keeping it green. As we do this, we focus on the importance of shared responsibility and collect a bonus for working, sometimes even shippable, code.

Leading Indicators in Unit Testing Implementation, Part II

Standard
This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

Part I was HUGE! Now, let’s look at broken builds. We want to see a decrease in their number over time.

This may sound a bit strange. Our CI is supposed to tell us if the build breaks, that’s its job. Isn’t having more of them a good thing?

Unsurprisingly, the answer is “it depends”.

As we want the earliest feedback, the answer is “yes, of course”, the CI system serves us well. However, if we don’t see a decrease in broken builds, that may mean the CI process is not working effectively. We should investigate.

CI PI

Let’s trace the steps leading to failing builds and see if we can improve our process.

Are all the tests passing locally? If not, we’re integrating code that fails tests into the trunk. If tests are not run locally, when they run in CI builds, they will probably fail too. That’s a big no-no. We may even find out the tests are not even run locally, and we’d want to improve on these behaviors.

If tests do run and pass locally before they are committed, there might be another problem. That may point to issues of isolation. If they pass locally, tests that depends on available resources in the local environments, find them there. But at the CI stage, they don’t and fail. More broken builds, indicate the team has not learned how to write isolated tests yet.

There might even be a bigger issue lurking.

Trust and accelerate feedback

We want to trust them on the CI environment, but since they “work on our machine” and not on the CI, these tests just got a  trust downgrade. This can have a weird counter effect on our way of running them.

Since results we trust run on the CI, and local runs are creating confusing results, we may stop running tests locally at all, and run them instead on the CI server making sure they run correctly there. When we do that, we make our feedback cycle longer, but more importantly, we risk the tests failing for the right reason, but holding the rest of the team hostage until they are fixed.

To get the right feedback early, we need to get back to running tests locally.

We want to increase the number of isolated tests, so they can be run locally, and can be trusted to fail on the CI server. Isolated unit or integration tests failing before committing is the first line of defense.

Then, we want to be able to run the non-isolated tests either locally or in a clean environment as we can manage. The point is to not commit code until we trust it. This may require changing available environments, modifying the tests to ensure cleanliness, pre-commit integration or any combination of those.

Can you believe all these improvement opportunities come from a single indicator? The deeper we dig, and more questions we ask, we can find opportunities for improving the process as a whole.

We’re not done yet.

Implementing Unit Testing – Leading Indicators (part 1)

Standard
This series deals with the implementation of a unit testing process in a team or across multiple teams in an organization. Posts in the series include:
GoalsOutcomesLeading Indicators ILeading Indicators II
Leading Indicators IIILeadership ILeadership IIThe plan

Now that we’ve talked about what we want to achieve, we better spread out some sensors. After all, we’re about to embark on a long and winding road. We want to know if we’re still going the right way, and if not, make a turn.

Leading indicators raise the red flag before the worst has happened. In our case, we don’t want to check in after six month and see no one’s written any tests for the last three months. We’d like to know sooner.

So what should we track?

If you’re thinking “coverage”, hold your horses. Coverage by itself is not a good progression metric. We sometimes mix simplicity of measurement with effectiveness, and coverage is one of those things.

Coverage as a number doesn’t have context – are we covering the right/risky/complex things? Are we exercising code but not asserting (my favorite gaming process)? Are we testing auto-generated code? Without the context, the coverage number has no applicable meaning.

Before we mandate tracking the entire software coverage, or setting a coverage goal, remember that while it is easy to track exercised code lines, the number doesn’t mean anything by itself.

So what do we look for?

The first metric to start tracking is the simplest one: Number of tests. (Actually two: tests written and test running).

Our first indicator if people are writing tests, is to count the tests. For that, we need to require a convention of test location, naming, etc. Guess what? These are needed also to run them as part of a CI build. Once everything is set up, we can count the tests where they count (pun!).

Then we want to look at the trend over time. The number should go up, as people add more tests. If the trend flatlines, we need to investigate.

One thing about this metric – if you never see a drop in the number of tests, something’s wrong. That probably means tests just stay there, and are not getting re-reviewed, replaced or deleted. In the short term, starting out, we want to see an upward trend. But over the long haul, code changes, and so do the tests. We want to see at least some fluctuations.

So what about that coverage?

Ok, we can measure coverage. We get that out of the box, right?

But we need to know what we’re measuring. Coverage means executed lines of code. So we can look at coverage (or lack of) as an indicator in the context we care for.

That could be any of the following:

  • Important flows in the system
  • Buggy code
  • Code we return too over and over
  • New component we want to cover

Or any other interesting bit. When we measure coverage, we want to see a trend of increasing coverage over these areas.

Now, how do you manage that? That requires more tricks, since we want to make sure we measure the right code, and the right tests. If the code architecture already supports it, it’s easy: 75% of a library for example.

If, however you want to measure coverage of a set of classes, excluding the other parts of the library, that requires more handling and management. Usually people don’t go there.

The funny thing is, the more specific you want to get, the less regular tools stop helping. And the broader numbers lose meaning.

By the way, the coverage metric should go away once you get to sufficient coverage. Once again, it’s not a number, but what we want to achieve – stability over time, or regression coverage. We can stop measuring then (and may be look at other areas).

Ok, we’ll continue the indicators discussion in the next post.

 

 

Over Exposure and Privacy Issues

Standard

This conversation comes up in every training I do about testing. I know exactly when it will happen. I know what I’m going to say, what they will say, and I know no one comes out completely convinced.

Then violence ensues.

We want to test some code. However, this code had started being written in the age of the dinosaurs, its size resembles some of the big ones, and the risk of changing the code is the risk of feeding a Jurassic beast.

There’s no point ignoring it, we’ll need to modify it in order to be able to test it. In fact what’s really bothering us is that some of the information in there needs to be accessed. I suggest changing the visibility of some members.

This escales quickly

Curious person: “What do you mean make it public”?
Me: You know, make it public. Or add a method that will enable us to set up the class easily, so we can test it.
Curious person: “But that means breaking encapsulation”.
Me: Might be. Is that wrong?
Mildly suspicious person: “Well, the design is there for a reason. The member is private because it needs to be private, we can’t expose it”.
Me: Remember that tests are first-class users of the code? That means that the design can change to support them too.
Mildly angry person: “But you can’t expose everything, what if somebody calls it”?
Me: That’s the point, that the test can call it.
Angry person: “But you can’t ensure that only tests will call it. At some point, someone will call it and create hell on earth. We’ve all been there”.
Mildly angry Me: Yes, it might happen, and we can’t predict everything and plan for everything, software is a risky business.
Very angry person: “THESE RULES WERE WRITTEN IN BLOOD!”

The design perspective

Let’s ignore the fact that you’re defending a horrible untested design, with the argument that changing the accessibility of one member will make it so much worse.

Encapsulation is a great concept. Having public and private members makes perfect sense. Some things we want to expose, others we want to hide.

But the water becomes murky very quickly. If we were talking black and white, only private and public, that would be ok. But for those special times, when private is just not enough…

There’s “friend” (C++).
Or “internal” (.net).
Or implicit package visibiliy (Java).
Not to mention “protected”.
Or reflection that bypasses everything.

It’s like we’re breaking encapsulation, but not really breaking it, so we feel better.

Let’s face it, design can serve more than one purpose, and more than one client in different ways. The language giving us tools doesn’t solve all the problems. That’s the cool thing about software, it can be molded to fit what we need it to do.

That involves living with not-so-perfect designs. It also means “changing the code just for the tests” is ok, because that’s another good purpose.

BUT SOMEBODY WILL CALL IT AND PEOPLE WILL DIE

Yes, somebody might call it.

That somebody might also look at the code and decide to change it, or add an accessor themselves.
It’s silly, not to mention risky, to think that a language keyword is the only thing standing betweend us and a disaster.

In C everything is public, how do you prevent the “someone might call it” disaster waiting to happen there? Playing with .h files and custom linking can be a lot more risky.

We do that the way that’s always more effective than the code itself: Processes that involve humans, not relying on tools.

We need to understand the purpose of code – create value for someone. If the code is beautiful but cannot be tested,  you don’t get points for using the “right” encapsulation.

The value comes when functionality works. In order for it to work, we need to check it. If we discussed the risks involved, and decided value comes from testing, it usually outweighs the risk of “someone might do something bad with the code” (after many have abused it already).

And if you feel this risk is not negligable – do something about it. Do code reviews, document and share the knowledge, create architecture guidelines.

But don’t rely on a language feature as the first resort.