Refactoring Kung-Fu, Part VI

Gil Zilberfeld explains refactoring in legacy code, when using ApprovalTests and something unexpected happens
Standard
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III
Part IVPart V

Part VI


Ok, time to run the test.

Well, that’s interesting. The test is not even running until the end, we’re crashing before it completes. Let’s see what’s wrong.

Seems that this line breaks:

when running the case where dish includes: Sauce = Pesto, Pasta = Ravioly. In this case, the sublist method is invoked with (1,0) which causes the IllegalArgumentException. Turns out sublist‘s fromIndex (the first parameter) should be smaller than the toIndex (the second). I bet that never happens in legacy code.

What happens now? We’ve clearly identified a case where the code breaks. It is one of these “how did it ever work” moments. What do we do?

Let’s investigate a bit more. Let’s run each case separately to see which cases cause the crash, and which just fail the approval. Is this a big problem, or a single case problem?

Turns out it just this case, all other cases don’t crash. What do we do? Well, we’ve identified a case that breaks the system, and we still don’t know enough about the system to create a test for it.

We can rely on the “how did it ever work” feeling, and therefore think: This case doesn’t occur in the wild, (otherwise we would have known about it), there’s not much sense in keeping it in check. As part of an investigation in the real world legacy code, I would try to reproduce the breaking case in the system, or at least understand why it hasn’t occurred yet.

What do we do?

We can disregard that case, and not use it as a characterization of the system. All other cases don’t cause breakage, and we can work with them.

And yet, it doesn’t feel right. The behavior exists in the code, and it has some kind of effect (the breaking kind). Maybe we want to make sure that although this is a breaking behavior, it is still a behavior we want to maintain. In that case, we can modify the test to something like this:

This way, we also log the error as part of the process, and in our refactoring preserve the behavior.

We can also modify the PastaMaker code to not break (swallow the exception, fix it ad-hoc, put the offending line inside a try-catch block and handle it there, or any other creative method). While this is a valid option, it is risky, because what we’re actually doing is not refactoring, we’re changing functionality without safeguards.

The option I chose is to disregard the breaking case for now, and add a test for it later. I’m planning to come back to it and make sure it is covered, once I know more about it.

Also, if we leave the approval tests as part of the CI, I’m not sure I want to preserve an exception throwing behavior. While it is the current behavior, I don’t intend to leave it like that eventually, which goes back to writing that test later.

So for now, I’m commenting out the last line of the dish list:

Next: Explore the output.

Refactoring Kung-Fu, Part V

Standard
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III
Part IVPart V

Part VI


Last time we talked about flow analysis in the code. We figured out the cases we want covered. But, even for these cases, we don’t know what the expected result is.

It’s worse than that in the real world. In our case, after we’ve drawn the boundary and decided which code stays and which waits around the corner, we remain with code that seem to have no known side effects.

But, what if our code, within the boundary, is more complex, and relies on other frameworks or libraries? It could be that there are side effects we don’t see directly from the code (like the calling method scenario from last time).

Ideally, in the real scenario, we would want to log everything that the system does, and from that, deduct what our asserts should be. Luckily, there’s a tool for that.

Enter ApprovalTests

Remember we talked about characterization tests? Those that we run in order to know what the system does, and then make those our asserts. ApprovalTests does that, and even creates those asserts. Kinda.

The way ApprovalTests works, is we write a couple of tests. When we run them, it creates two text files: The “received” file, and the “approved” file. The “approved” file becomes our golden template. Every time we run the test, the regenerated “received” file is compared to the “approved” file. If they are completely the same, then at least we know that the tests run as before (to the best of our knowledge). If the files are different, we’ve broken something. Excellent for refactoring.

But what do these files actually contain? Well, everything we want to write. ApprovalTests ties into the standard IO stream, so whatever we write in the tests and in the code goes there. The more we write there, the better our comparison between changes becomes. If however, we write just a bit, we may know that this bit hasn’t changed, but not a lot about other cases.

Once we’re done with out refactoring, we can either throw away the tests, or keep them, along with the “approved” version in our CI. Not ideal, but a lot more than no tests.

You can read more about it on the ApprovalTests site, of course. For now, I’m assuming you have some familiarity with how it works, and continue from here.

Let’s check out the test

If you look at our tests folder, you’ll see a single test, an ApprovalTests test. I’m using the @DiffReporter  annotation, so if anything is different between the “received” and “approved” files, it triggers my diff tool to run.

Also you can see on our test, there’s no Asserts of any kind, but a simple:

Which basically means nothing. For now.

Now, if we want to cover our code with tests (or ApprovalTests), we need to collect a lot more logging. We’ve already covered what cases we need to track, and we’ve modified the code, in order to easily add those writing.

Next step? Add a couple of mocks. Remember the Dispenser interface? Let’s create a mock dispenser. We can use a mocking framework, but it’s much easier to create a manual mock. Since only the test uses it, I’ll create it in the test folder. Here it is:

As you can see from the implementation, whenever a method on the mock gets called, it adds the information to the log. And I’ve added a nice  toString()  override for getting the log. Since the Dispenser interface is our boundary, we’d like to know everything that goes through it. I’m not doing this now, but I’d also log what I’m returning if I think it makes sense.

Note that logging doesn’t have to be concentrated in the mocks. You can also spread all kinds of logging in the code itself. Then collect those later into the test.

Now that we have logging going on, let’s write a real test. Remember the whole test case analysis? Here’s how the test looks like, and it runs many cases (almost like a parametric test):

All the dishes are cases we’ve identified and want to run. And here’s the Dish class (again, in the test folder, it’s not part of the production code. Maybe in the future):

As you can see I want the dishes to also add themselves to the log, so now I know what goes into the Pastamaker, in addition to what goes out.

Next time: Running the test and exploring the results. It’s going to be a bumpy ride.

 

Refactoring Kung Fu – Part IV

Gil Zilberfeld explains about refactoring test cases in legacy code
Standard
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III
Part IVPart V

Part VI


The current code version we’re talking about is tag 1.1.

Last time, we have surrounded our code with a boundary, doing some early refactoring. We now know the entry points, and the exits are covered by interfaces, leaving code we don’t want to deal with outside our system. We can basically start writing tests, but which ones? How do we know can cover every case?

The answer is we don’t and can’t know. But we do have clues in the code itself. In our case, it’s the conditions in the code. In other cases in our legacy code, it is about identifying the values that can affect the different flows in the code.

In our example, the options are combinations of sauces, pasta types and places. We’ll take a look at those in a moment, but we need to remember to also look beyond the code.

Double take

There’s another aspect for our cases that is not apparent in the code. What if the cook method is called more than once? From reading our example, you can see that the IngredientsList is created with each call. However, what if it was created in the constructor of PastaMaker class? Keeping state between method calls can have an affect that may not be visible directly from the code.

In our refactoring example, we won’t go there. For us, each flow inside the cook method is isolated from other flows, and therefore we can cover those flows by different tests, exercising the method once. In more complex cases in legacy code, where state of one method call affects consequent calls, we might consider better coverage, by calling our code once, twice and maybe more and in different constellations.

Gotta catch them all

If we want to cover all possible combinations in our code, we’re going to need 4 sauces x 2 pasta x 4 places = 32 tests. For one lousy method. Imagine how many combinations we have in the legacy code counterpart. If want to be thorough, we need to cover them all.

If we’re feeling a bit gutsy, we can start eliminating cases. For example, if we look at the places, we can see that they don’t affect the conditions, only the results. That means, in the current code, we can consider the cases for the same pasta and sauce, but with different places as equivalent. Meaning, we can reduce each place quartet of tests to a single test.

I am feeling a little gutsy, so I’ll do that. Now the only question remains: How do we test?

We’ve already talked about characterization tests. To implement them, we’ll need to write a test for each test case that we’ve identified, run and see what we get, and specify the actual result as the expected one.

And what is it that “we get”? In our case, what ever comes out of the system. Meaning, every call through the IDispenser interface. We need to log all the calls, in the order they come out, with all the parameters.

The option I’ll take here is with Approval Tests. I’ll talk about it shortly, and go into the tests next time.

Refactoring Kung Fu: Part III

Gil Zilberfeld explains refactoring for legacy code, about method extraction and constructor injection for refactoring legacy code
Standard
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III
Part IVPart V

Part VI


Check out the original version of the code (Java, and also in C#, in tag 1.0).

Right, time to move some code around. Before we do that, remember step #1 in our process: Draw a line around our tested code. For us it’s mainly the “cook” method. That’s the heart of our code and the code we want to refactor.

The “cook” method is the entry point. Where does our system end? In our example, it is the private methods (which I didn’t add code to). Let’s take a look at them:

There are two types of methods here. The first two methods go outside the class (or would if there was an implementation). In the real world, the may call another REST service, or send an async message. In the real world, that code can also be the combination of inlined logic, and external calls.

The last four are “regular” private methods. This is code we’ve extracted, that, in our example, stays in the class. Or call another class that’s still within the system boundaries.

In general, these are the two kinds of methods we’re going to deal with: The ones that call outside the boundary of our system, and those that stay within it. That is important, because if we want something outside, we want to establish a clear boundary, and we use *methods* to do it.

Now that we have a boundary, let’s push code outside it. And how do we do it? Introduce an interface You’ve probably seen that coming. Let’s call this interface dispenser. In our case it has two methods:

Basically, we extract the code the code in the private method and move them beyond the interface, and whatever parameters we pass to the private method, are going to be parameters in the interface method. This is the kind of low-risk, and even possibly automated, refactoring for introducing interfaces and method extraction. If the code in the private method we’re moving accesses local or instance variables, we should pass them as well to the interface.

And it doesn’t have to be a single interface, like in this example. We can define two interfaces: IngredientDispenser and PastaDispenser. Whatever makes sense in terms of cohesion. The boundary we’re drawing can be temporary for our refactoring, but also can remain later because of separation of concerns, and a better design.

We need to introduce the interface somehow into the class. I decided to inject it through the constructor, but adding a setter works as well.

So here’s the constructor.

Again, this is a modification to the code, that is a low risk change.

But now we have a clear boundary, and we can use it later. The next step though is test case analysis.

The modified code is under tag 1.1 (Added Dispenser interface through constructor)

Refactoring Kung Fu – Part II

Gil Zilberfeld explains the principles of refactoring legacy code
Standard
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III
Part IVPart V

Part VI


Last time I gave you a sneak peak at our PastaMaker. You can check the whole project here.
Let’s take a closer look.

The main logic is in our PastaMaker, so focusing on testing it makes sense. Most of the logic is there, and so are the changes we need to make. We’ll need to do that very carefully, as the system is not test-friendly.

As you can see, our PastaMaker has one main method, “cook”. It takes two parameters, analyzes them, and based on their value does all kinds of things. The “all kind of things” are in private methods, that can call outside the class. I’ve left most of them empty for our example. In the real world, the content of some of the private methods was inlined in the main method. Also in the real world, there are a dozens of conditions and dozens of lines of handling code in that class, so the scale is larger. The principles that we’ll use still apply though.

Do you see the if-else-if-else-if logic? That’s the main problem. It’s not only hard to understand; the order of the clauses is important. It’s hard to change it without risking changing the functionality. Also, what we do in them, can impact subsequent conditions.

How did we get here?

No one plans for this kind of code, but you can see how it gets built, right? You don’t want to touch the “already working” cases. You find the least risky, most probable place to add your new case, close your eyes, commit the code and run away screaming. Rinse and repeat this process for a few years, and we get to this structure.

Our PastaMaker is a great candidate for refactoring. There’s also something that we can see from the code that can help us: We can identify the different cases (but are they all the cases?). In the original code, the conditions are similar – the inputs to the methods are analyzed in the same way (Strings instead of enums). However, the combination number of cases can be astounding.

Our approach to refactoring this code needs some discipline, and some guiding principles.

  • Put (limited) trust in our tools

Our tools (in this case my IDE) will help with automatic refactoring. Depending on the tools we use, we can usually trust renaming, extracting, and sometimes even moving methods around. We can also rely on identification of “unreachable code” after a return.But we don’t trust our tools totally. For example, to invert if-else-if clauses automatically is dangerous. The semantics of our application is not known to our tools. Our trust has limits, and tools can take us so far.

  • Minimize risks

While I’d like the IDE to refactor everything for me, when I move code around, or introduce interfaces, and make other changes my tools can’t do reliably, I need to trust myself.
If this is a critical component, like the PastaMaker, I don’t.When we refactor critical code, we do it with a pair. Or in a group. The process may be long, and we’re going to make mistakes. A partner can help identify the errors, pick up on missing cases, and in general keep us moving forward. We minimize risks where and where we can and that includes another set of eyes.

  • Refactoring means NOT adding code as we go

This is very important. We’re already knee-deep in code we don’t understand. Even worse, we may think we do, and that may not be true. Until we finish making change, and have more confidence the code works, when it has tests around it, we don’t add any new functionality. This is, by the way, one of the genius pillars of TDD: separate the functionality from the refactoring. Only this time, we’re going the other way around.

  • New code is only added after, and with accompanying tests

Once the code is ready in terms of form, and is stable with tests, only then we can add new functionality. And use test-first.

  • Leave the code better than how we found it

We don’t refactor for nothing. We want better code we can add new functionality to. That means, better, simpler, factored code with tests. Oh, and since we’re doing this in a pair, at least two people agree that the code is better than the original.

Principles are not enough

So what’s our plan of attack? The following techniques help us.

  • Define the boundaries of our system-under-test

We need to decide where our system starts and ends. We’re going to move code inside it, inject inputs to one end, and sample on the other end. But we start with the boundaries.

  • Build adapters around those boundaries

Once we’ve decided where the other end is, we’ll need to create adapters to mark our territory. Those adapters will call the original code, and can help us decouple dependencies. In further testing we can use the adapter for mocking..

I’m pretty sure when you here “adapters” you think “interfaces”. We’ll use them, but that is a language construct. If we’re working in C, for example, we’ll need something similar, since there’s no “interface” construct in C.

  • Put as many safeguards in place as we’re refactoring
    We don’t have tests yet, but we will eventually. We need to think about all the cases we want to cover, and we want this comprehensive list early, before we touch the code.
    Later, we’re going to put information in the code that not only help our tests, but also to debug them when they fail. Feedback is our friend.

Next time, we start moving code around.

Refactoring Kung Fu – Part I

Gil Zilberfeld goes through an example of refactoring legacy code, on the way to creating tests for it
Standard
In this series we're taking a look at how to refactor un-refactorable legacy code.
Part IPart IPart III
Part IVPart V

Part VI


As long time readers, you know that if we’ve got legacy code, we want tests for that. However, the code doesn’t always play along.

As you’ll see in this example, this is the kind of code that really needs tests. It’s full of logic. It’s alive and working, and it’s so core to the application, that we need to keep going back into it, and we tremble whenever we need to make changes in it.

So what? Tests are important, so write them.

However, things are not that simple. As you look at that legacy code, you know it’s going to be quite a task. The code is messy and full of dependencies. And, that’s not even the main problem: we don’t actually know how it works. Oh, it’s working, for years now, but there’s no updated documentation, and the code itself looks like a recipe out of a pasta book.

Blind testing

Tests have basically a simple construct – define a system with boundaries, pour some known inputs, and see if the outputs come as expected. Our problem starts with the boundaries of the system. It’s hard to draw them. Once we do, we don’t have a comprehensive list of inputs, and of course, what is the expected outcome for them.

Do we back down? Hell no. That is not the kung fu way.

We can still write tests. This special breed of tests is called “characterization tests”. We write characterization tests for a system with unknown or uncertain behavior. They start out like tests without asserts. We pump in inputs, see what comes out in the other end. We agree that the system works as-is, meaning we can trust that for the given inputs, the outputs are the expected outputs. Then we convert these outputs into asserts. Presto, we have tests.

Sometimes, the system is so messed up, characterization tests are the only way to go. However, it’s still defined by its architecture. We may know where the entry points are, but we still need to define what “the other end” is. Is it the database? Calling an external API? An internal state change?

Also, in a medium-to-ginormous size system, with so many possible states, can we cover the code with tests reasonably well? Can we collect and collate all these outputs?

Not so presto

Wouldn’t it be easy if we could make the system boundary a size that is reasonable, and in which we can identify most states, and therefore control what we can, and in fact, write effective valuable tests?

Yes, yes it would.

That requires some guts, a few guiding principles, some tools, and above all, a repeatable method. We can build them, we have the technology.

In fact, when we don’t write tests, and just change the code, we already cross out the guts part (along with the stupid/courageous border). Once we acknowledge that, we can move forward with solving the problem.

The example we’ll explore is based on an actual code base. Imagine a controller doing many things. The code is not necessarily contained just in the controller class. It reads and writes to the database, calls out to other APIs, doing some logic of its own, and returns results through its API.

Our “system” starts out with a REST API in the front, and includes the controller, the data layer, the database, other services, and even a middleware, like Spring. That’s a big system. We want to cut it down to unit-testable size, if possible.

Since the original code is the definition of spaghetti, I thought of re-creating our sample example: The PastaMaker class. Here’s a sneak peek:

Hey, if you want to cook some pasta, you have to break some eggs. Or at least untangle the spaghetti first.

If this looks familiar in some way, stay tuned. We’re going to slay the legacy code dragon.

Real Life TDD With Spring – The Road to the First Test III

Gil Zilberfeld explains the differences between the APIs for the TDD spring example.
Standard
How does TDD look like in real world development of microservices or web interfaces? That's what this series is all about.
1. Introduction2. The requirements3. Test case analysis I4. Test case analysis II
Test case analysis III6. Test case analysis IV7. Setting up a project with Spring Boot8. Which tests to write?
9. API design

We’ve come up with a few alternatives for the APIs (their draft version, anyway). I’m going to weigh them here, and remind ourselves – this is the internal API forma, but also has a REST counterpart.

Let’s talk about each of the options, but this time, not just about how they look, but also about their design implication.

String pressKey(String key);
One API to rule them all. There’s an issue with its design though. If the UI doesn’t tell the server: “I’m ready”, or if it was refreshed in the middle of an operation, there could be a disconnect between what the UI displays and the server’s state. And where there’s disconnect, there will be bugs.

String pressKey(null);
Still a single API, but this time, we’re talking with the server. In order to reset, we’re using an API called “pressKey“, when no one has actually pressed any key. Exposing a special meaning with “null” to the outside world is not something I like to do. The last guy who did it called it “the billion dollar mistake”. Not only is anything being pressed, no nulls have been pressed in the making of this post. We’d like the interfaces to speak the language of the domain as much as possible, and this one kind of breaks it.

So how about that reset-as-a-startup?:
String pressKey("C")"

It’s a more attractive option, since the initialized system should behave as if it was reset with the “C” key. There is still the issue that nobody actually pressed anything. And although I don’t like planning and designing for future requirements, there may be a future where resetting results has a different behavior than the one on startup. In that future, we might want a different functionality when pressing “C”.

void pressKey(String key); String getDisplay();

If we separate the get/set operations, (and follow the command-query separation principle) we can get a nice, clean interface. At the start, we’ll call the “getDisplay” API to show what the calculator’s state is at the moment. Which may not be zero. It can be the last held result).
This is an interesting feature, although this kind of behavior was not requested. We need to show “0” in the beginning.

We can assume that “getDisplay” will result in 0, but that gives a separate meaning to “getDisplay” – if used at the start it shows zero, regardless of what the server holds. Additionally, this APIs require two calls to the server every time, which may be costly and slow.

void initializeCalculator();
Which leads us to a separate API for initialization. We probably need it – to tell the server the UI is ready. However, where does the zero come from? We still don’t want to press anything. So maybe the initializing method can return what to display:

String initializeCalculator();

While this is promising, there is some inside knowledge about what the return value that we’ll need to document, so people from now on will know. Tests will help, but maybe a better name can help:

String initializeAndGetDisplay();

It’s better, but still does two things. For now it will do. Next time I’ll tell you my pick and write some tests.

But before we go, you’re probably thinking – what we did here is a bit of cheating. We should have the API discussion while we write the tests. This is where we should actually do that thinking, not before writing even “@Test“. In fact, before the glorious TDD days, we’d write a design up front (in a 500 page document), and invent that API (hopefully thinking also about alternatives). Did we just jump back in time?

As opposed to a series of posts, our (and my) thinking is not linear. Neither of the “proper” TDD, or “proper” design is how we think. TDD, and development in general, is an iterative process. In this series, so far I’ve been writing and deleting stuff, just because I’ve learned things on the way and even (gasp), changed my mind a couple of times. It’s ok for thinking up front, (I’m a huge fan of thinking!), and it’s ok to let things change on the way. That’s how we think, learn and work and we should embrace it.

Real Life TDD With Spring – The Road to the First Test II

Gil Zilberfeld describes options for API design for TDD Spring applications
Standard
How does TDD look like in real world development of microservices or web interfaces? That's what this series is all about.
1. Introduction2. The requirements3. Test case analysis I4. Test case analysis II
Test case analysis III6. Test case analysis IV7. Setting up a project with Spring Boot8. Which tests to write?
9. API design

Last time, we started thinking about considerations for picking the right test to start with. We decided to start with integration tests. So, which cases do we want to cover?

Obviously, they should capture the main functionality of the system. Going over our test cases, the following are the main capabilities of our system:

  • Pressing keys
  • Displaying results
  • Calculations

All the other cases are either relatively edge cases (error handling, overflows, etc. ) or more complex scenarios. While these are proper behaviors in our system, they can be covered by unit tests (so it seems right now). For our first set of integration tests we want the simpler stuff. Later we might consider adding more cases.

So we can choose the following cases for our integration tests:

  • Pressing keys and displaying results: “1”,”2″,”3″ => “123”
  • Calculations: “1”,”+”,”3″,”=” => 4

While these capture the main capabilities, I’d like to add the initialization case too:

  • Nothing => “0”

Why?

First, this could be important for building the system as a whole. Delivering an end-to-end story that displays “0” is both easy to do (looks like it, anyway) and valuable. It is good feedback for us that our system works.

Second, this is obviously a quick way to get to a working feature. If you’ve already thought a couple of steps ahead (and that’s not a bad thing to do, even with TDD), you’ve figured that passing the calculation tests would take a lot more work than the initialization test. A quick win is something we all can use. Plus, it looks like a very simple test, that will also drive the first API design.

What’s not to like?

Initial Analysis

Let’s think it through. We have a few core assumptions already:

  • We’ll be using TDD for the inner logic, which we didn’t design yet (obviously)
  • We’ll be using REST services as the API, with input and output as strings
  • Input will come in the form of “pressed keys”
  • Output will come out as the form of “displayed result”

If that’s all we go on, we can presume that we need an API that looks like this:

and its REST service wrapper. We press a key, then return what to display. Let’s call it the “pressKey” API for short.

For all other cases, we’ll need this API. However, with the initialization case, no key is being pressed. Hmm.

We have a couple of design options regarding our API suggestion.

  1. The UI will present the 0. The initialized system does not need to be asked what to display. Once the initial zero is displayed, we can use the “pressKey” API.
  2. Use the “pressKey” API, but give it another meaning. For example, use it like this in the beginning:

    or

    In both alternatives, the back-end will know to respond with “0”..
  3. We can split our “pressKey” API into two separate APIs, each in charge of a different operation.

This way, on initialization we just call the “getDisplay” API.

4. Or maybe, we need a separate API for initializing, in addition to the original “pressKey“.

We would call this in the beginning, telling the calculator we’re starting and we’re off. In this version the front-end will display the zero (as with option 1). Or we can use this  version:

Awkward a bit, why would “initialize” return a value?

Decisions, decisions. But you see, we’re starting to think about how we operate the system, as part of the use cases. We want to find the best alignment. The tests will follow.

I’ll give you my pick next time, but here’s your chance to suggest other options (with explanation) and name your favorite.

Real Life TDD With Spring – The Road to the First Test I

Gil Zilberfeld describes the thinking about selecting tests for TDD with Spring
Standard
How does TDD look like in real world development of microservices or web interfaces? That's what this series is all about.
1. Introduction2. The requirements3. Test case analysis I4. Test case analysis II
Test case analysis III6. Test case analysis IV7. Setting up a project with Spring Boot8. Which tests to write?
9. API design

When we set up the project, we ran a couple of tests (integration and unit) to see if the environment we have runs them. There was no actual logic involved, but we got initial feedback that the platform is ready to work as-is.

The next logical step is to set up a couple of real tests. But what kind of tests should we start with, and how do they fit our TDD effort?

The right stuff

In our test analysis, we ran through a lot of cases, and some of them can serve as acceptance tests. As acceptance tests, they prove that the system works for the customer as specified. As such they would operate the UI and the entire system.

The main issue with testing through the UI is volatility – the UI tends to change a lot more than functionality. Which will cause us to change our tests frequently – a big effort, with relatively small value.

Another issue with testing through the UI is that we add another layer (or more) for the tests to pass through. Although they give as a “thumbs up” when they pass, we’re adding more pitfalls for the tests, to fail for the wrong reason.

It’s true that integration tests can fail because of everything in that black box we write. But our UI tests, can fail because of more network issues, security and configuration between the UI and the API.

What happens then?

Let’s talk for a minute about the cost of “maintaining the tests”. Although the test failure is the signal, what happens next is we start investigating everything – the tests, the code, the environment they ran, the build system, and other things as well. This takes time (sometimes a lot). If we find out the tests failed for “the right reason” – the code doesn’t do what we intended it to do – that’s good. They fulfilled their job.

If the tests failed for any other reason – tests are no longer correct, environment issues, missing files that were not added to source control – we call that “waste”. This is not just a waste of time – these failures erode our trust in the tests we have, and reduce our motivation to keep writing them. On top of the wasted investigation time, it sometimes results with homework: Fix the tests, add the file, re-configure the environment.

If maintenance cost is high in integration and system tests, why not just skip all those and just write unit tests? They wouldn’t fail for the wrong reason – they are isolated, they test very small scope, and they are cheap to write.

If we only wrote unit tests, that don’t check the whole system, we won’t have a proof that the system actually works. So as we write tests, selecting the right type of tests becomes a balancing act. The testing pyramid model describes the resulting set of tests.

Ok, so far we agree that we don’t want UI tests, and we’ll need to select API and unit tests. How is that related to TDD?

The Big, The Small and the Ugly

Test-first adds another dimension to the game. In order to use TDD to make progress quickly, we get pulled toward unit tests. Using TDD from the API level is possible, but we won’t move as quickly. We’ll need to build APIs and call them in the tests, add beans to configurations, setup dependencies, all  slower than writing tests for the code itself. In TDD we expect very short cycles of development. Every couple of minutes we have another passing tests. Going outside to the API level, depending on how complex the system is, the cycle can turn to hours (and sometimes days, I’ve seen it) between passing tests.

On the other hand, we still want those bigger tests. We’s want to set up a few API tests, that act as guardrails. They make sure that we’re on the right track, while we’re messing with internal things inside the system. Having these big scope tests can lead us in the right direction.

If we write them BDD-style, they can even tell the story in the domain language, not “ugly” REST API dialect. Remember those test cases? we wrote them in our own language, not using any technobabble. We can use BDD tools (e.g. Cucumber) for that. Or we can simply write tests in a readable manner.

Another advantage of having the bigger tests, is help us think about the APIs of the system. With Test-First (in any scale) we think about how we talk with the system. The integration tests can help us there as well.

If we had all the time in the world, we would cover all the code with all kinds of tests. But we don’t. So we need to choose which cases we want to use as the API tests, and which to leave as unit tests.

Let’s start off with a couple of API tests, then move to TDD at the unit level.

Real Life TDD With Spring – Initial Project Setup

Gil Zilberfeld describes the initial Spring project, tests and TDD setup
Standard
How does TDD look like in real world development of microservices or web interfaces? That's what this series is all about.
1. Introduction2. The requirements3. Test case analysis I4. Test case analysis II
Test case analysis III6. Test case analysis IV7. Setting up a project with Spring Boot8. Which tests to write?
9. API design

Finally! Actual TDD and code and Spring!

Well, we’ll start with Spring first. But first a reminder.

We’ve already constrained ourselves with the architecture. This is going to be a web based, REST services based Spring application. In the general case, the architecture should support building the app in the most reliable, quick and cheap way. In our case, it’s because I want to improve my Spring skillz.

If you want to follow up, I’ll be posting the code on github. I’m going to have a single project, and tag each version of with the post’s name and date. For example, this post is tagged 1.0.

Generation S

The first thing I’ll do is create a project in Spring Boot. It’s the quickest way to set up a Spring based project these days (I used version SpringBoot 2.0).

What needs to go into our project setup? We’ll select a maven project with the following metadata:

Group: tddWithSpring
Artifact: calculatorDisplay

I’m selecting two dependencies for the generation of the project:

  • Web – Obviously, we’re building a web app.
  • JPA – For database access.
  • H2 – In memory database for our integration tests

JPA and H2 are not needed to begin with, since if you recall, we don’t have any persistency requirements yet. We can start without them and add them later, based on our needs. But since I’m running the show, we’ll add both dependencies now. (One of the other thing I did is added and commented out properties for those two in the application.properties file under resources).

Now, let’s press the “Generate Project” button, and… don’t you love technology? We’ve got a starter project. The regular Maven setup includes source files and tests.

But we still need to do some small modification and cleanups to the project before we start

Remove all clutter

First, let’s remove the mvnw files that were created as part of the project generation. We don’t need them. Let that be a lesson to you – not everything that can be generated is actually good for you. .

The generated project already contains a CalculatorDisplayApp class, and an empty CalculatorDisplayApplicationTests test class. Note that this class is for integration tests – you can see it annotated with @SpringBootTest and @RunWith(SpringTestRunner.class). I’ve renamed the class IntegrationTests for now.

Let’s run the integration test (which I’ve renamed also), the one that was generated:

The empty test passes, imagine that. But there’s a whole lot of logs going on while it’s running. These Spring logs maybe helpful (maybe), but for us, it’s just a distraction. Digging around StackOverflow, I found the solution to remove all logging below ERROR level (that’s unneeded DEBUG and INFO lines). It involves adding a logback-test.xml file to the test/java/resources folder. Each line tells the minimal logging for each package. For example:

That’s a lot less noise on the console. With those in, we’ve minimized the logging to something more acceptable.

Run, test, run!

The first thing I usually do to make sure a system runs is to fail a test and then pass it. I do it with the integration test (I’ve added  assertTrue(false)  and then change it to true). The test runs as expected.

However, running this single test clocks in between 11-15 seconds from the command line (using “mvn test“, not including compilation). Interestingly enough, running the same test from within Eclipse (“Run as JUnit Test“) takes around 5-7 seconds. (Times are on my laptop, YMMV).

That tells us a lot about the overhead of running the Spring framework, and also about Maven. The more complex our software gets, we’ll see this overhead grow, and that’s bad for our feedback cycle. We need integration tests, but for our TDD purposes, we’ll need quick running unit tests as well.

Here’s a thought: If I’m already in learning mode, why not use  JUnit 5? Sure, why not. Add the following to the POM file:

To check that JUnit works, I’ve added a regular test class (unsurprisingly called UnitTests), with a small test (same as before). This is a regular JUnit test, and no @RunWith annotation:

This test alone, runs in Eclipse in less than 50 micro-seconds. That’s a very good feedback time.

However, when we run it with maven (“mvn test“), which runs both integration and unit tests, we get again to 10-15 seconds. To run just the unit test, we can run Maven with a specific test class, just the UnitTests class:

Hmmm. This brings the time to around 5-7 seconds. Still not micro-feedback, but a lot better, without getting the whole Spring framework grinding us to a halt.

So, what have we learned?

  • We have a project ready for work, that was sort-of easy to set up.
  • We know our tests are running.
  • We know that integration tests that need Spring have a big overhead to do nothing. We also learned that running tests through Maven adds another overhead that we might want to work around.

Next up: Deciding which tests to write.

Code for this post is here.