Refactoring Kung Fu – Part I

Gil Zilberfeld goes through an example of refactoring legacy code, on the way to creating tests for it
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III

As long time readers, you know that if we’ve got legacy code, we want tests for that. However, the code doesn’t always play along.

As you’ll see in this example, this is the kind of code that really needs tests. It’s full of logic. It’s alive and working, and it’s so core to the application, that we need to keep going back into it, and we tremble whenever we need to make changes in it.

So what? Tests are important, so write them.

However, things are not that simple. As you look at that legacy code, you know it’s going to be quite a task. The code is messy and full of dependencies. And, that’s not even the main problem: we don’t actually know how it works. Oh, it’s working, for years now, but there’s no updated documentation, and the code itself looks like a recipe out of a pasta book.

Blind testing

Tests have basically a simple construct – define a system with boundaries, pour some known inputs, and see if the outputs come as expected. Our problem starts with the boundaries of the system. It’s hard to draw them. Once we do, we don’t have a comprehensive list of inputs, and of course, what is the expected outcome for them.

Do we back down? Hell no. That is not the kung fu way.

We can still write tests. This special breed of tests is called “characterization tests”. We write characterization tests for a system with unknown or uncertain behavior. They start out like tests without asserts. We pump in inputs, see what comes out in the other end. We agree that the system works as-is, meaning we can trust that for the given inputs, the outputs are the expected outputs. Then we convert these outputs into asserts. Presto, we have tests.

Sometimes, the system is so messed up, characterization tests are the only way to go. However, it’s still defined by its architecture. We may know where the entry points are, but we still need to define what “the other end” is. Is it the database? Calling an external API? An internal state change?

Also, in a medium-to-ginormous size system, with so many possible states, can we cover the code with tests reasonably well? Can we collect and collate all these outputs?

Not so presto

Wouldn’t it be easy if we could make the system boundary a size that is reasonable, and in which we can identify most states, and therefore control what we can, and in fact, write effective valuable tests?

Yes, yes it would.

That requires some guts, a few guiding principles, some tools, and above all, a repeatable method. We can build them, we have the technology.

In fact, when we don’t write tests, and just change the code, we already cross out the guts part (along with the stupid/courageous border). Once we acknowledge that, we can move forward with solving the problem.

The example we’ll explore is based on an actual code base. Imagine a controller doing many things. The code is not necessarily contained just in the controller class. It reads and writes to the database, calls out to other APIs, doing some logic of its own, and returns results through its API.

Our “system” starts out with a REST API in the front, and includes the controller, the data layer, the database, other services, and even a middleware, like Spring. That’s a big system. We want to cut it down to unit-testable size, if possible.

Since the original code is the definition of spaghetti, I thought of re-creating our sample example: The PastaMaker class. Here’s a sneak peek:

Hey, if you want to cook some pasta, you have to break some eggs. Or at least untangle the spaghetti first.

If this looks familiar in some way, stay tuned. We’re going to slay the legacy code dragon.

Leave a Reply

Your email address will not be published. Required fields are marked *