The Payback on Automated Unit Tests

I’m a Test-Driven Development (TDD) convert now. All of the “business logic” (aka “domain logic”), and more than 95% of my framework logic is covered by automated unit tests because I write the test before I write the code, and I only write enough code to pass the failing test.

It’s really hard to find anyone talking about a measurable ROI for unit testing, but it does happen. This study says it took, on average, 16% longer to develop the initial application using TDD than it did using normal development methods. This one reported “management estimates” of 15% to 35% longer development times using test-driven development. Both studies reported very significant reduction in defects. The implication is that the payback comes somewhere in the maintenance phase.

From personal experience I would say that as I gain experience with TDD, I get much faster at it. At the beginning it was probably doubling my development time, but now I’m closer to the estimates in the studies above. I’ve also shifted a bit from “whitebox testing” (where you test every little inner function) to more “blackbox testing”/”integration testing” where you test at a much higher level. I find that writing your tests at a higher level means you write fewer tests, and they’re more resilient to refactoring (when you change the design of your software later to accommodate new features).

A Long Term Investment

It’s hard to justify TDD because the extra investment of effort seems a lot more real and substantial than the rather flimsy value of quality. We can measure hours easily, but quality, not so much. That means we have a bias in our measurements.

Additionally, if and when TDD pays us back, it’s in the future. It’s probably not in this fiscal year. Just like procrastination, avoiding TDD pays off now. As humans, we’re wired to value immediate value over long term value. Sometimes that works against us.

A Theory of TDD ROI

I’m going to propose a model of how the ROI works in TDD. This is scientific, in that you’ll be able to make falsifiable predictions based on this model.

Start out with your software and draw out the major modules that are relatively separate from each other. Let’s say you’re starting with a simple CRUD application that just shows you data from a database and lets you Create, Read, Update, and Delete that data. Your modules might look like this:

Contact Management
Inventory Management

If you go and implement this using TDD vs. not using TDD, I suspect you’ll see a typical 15% to 35% increase in effort using the TDD methodology. That’s because the architecture is relatively flat and there’s minimal interaction. Contact Management and Inventory Management don’t have much to do with each other. Now let’s implement two more modules:

Orders
Purchasing

These two new modules are also relatively independent, but they both depend on the Contact Management and Inventory Management modules. That just added 4 dependency relationships. The software is getting more complex, and more difficult to understand the effect of small changes. The latter modules can still be changed relatively safely because nothing much depends on them, but the first two can start to cause trouble.

Now let’s add a Permissions module. Obviously this is a “cross cutting” concern – everything depends on the permissions module. Since we had 4 existing modules, we’ve just added another 4 dependency relationships.

Ok, now we’ll add a Reporting module. It depends on the 4 original modules, and it also needs Permissions information, so we’ve added another 5 dependency relationships.

Are you keeping count? We’re at 13 relationships now with just 6 modules.

Now let’s say we have to add a function that will find all customers (Contact Module) who have a specific product on order (Orders) that came from some manufacturer (Purchasing and Contact Management) and a certain Lot # (Inventory) and print a report (Reporting Module). Obviously this will only be available to certain people (Permissions).

That means you have to touch all 6 modules to make this change. Perhaps while you’re messing around in the Inventory Management module you notice that the database structure isn’t going to support this new feature. Maybe you have a many-to-one relationship where you realize we really should have used a many-to-many relationship. You change the database schema, and you change the Inventory Module, but instead of just re-testing that module, you now have to fully re-test all the modules that depend on it: Orders, Purchasing, and Reports. It’s likely we made assumptions about that relationship in those modules. What if we need to change those? Does the effect cascade to all the modules in the software? Likely.

It doesn’t take long to get to the point where you need to do a 100% regression test of your entire application. How many new features potentially touch all modules? How long does it take to do a full regression test? That’s your payback.

You can measure the regression test time, and if you use a tool like NDepend you can measure and graph the dependencies of an existing application. Using your source control history, you can go back and determine how many different modules were touched by each new feature and bug fix since the beginning of time. You should be able to calculate:

How much time it takes to regression test each module
Probability of each module changing during an “average” change
The set of modules to regression test for any given module changing

Given that, you can figure out the average time to regression test the average change.

Obviously, the average regression test time must be longer than 15% to 35% of the time it took to write the feature (assuming you’ll keep following TDD practices during the maintenance phase). The amount of time it takes to test in excess of that is payback against the initial 15% to 35% extra you spend developing the application in the first place.

What kind of numbers are we talking about?

Let’s run some numbers. A lot of places say 30 to 50% of software development time is spent testing. Let’s assume 50% is for the apps with “very interconnected dependencies”. Also, let’s say our team spends an extra 33% premium to use a TDD methodology.

Now take a project that would originally take 6 months to develop and test, but with TDD the development took about 33% longer, so +2 months. The average change takes 3 days to code and test, or 4 days with TDD. Let’s say the regression test on something that took 6 months to develop (personal experience – a 3 month project had a regression test plan that was about 1 day to run through) would have to be about 2 days.

Without TDD, a feature would take 3 days to write and test, and then 2 days to do a full regression test. Using TDD, it would take 4 days to write and test, but zero time to regression test, so you gain a day.

Therefore, since you had to invest an extra 2 months (40 days assuming one developer) in the first place to do TDD, you’d see a break-even once you were in the maintenance phase and had implemented about 40 changes, each taking 4 days, which means 160 days. That’s about 8 months. That ignores the fact that regression test time keeps increasing as you add more features.

Obviously your numbers will vary. The biggest factor is the ratio of regression test time vs. the time it takes you to implement a new feature. More dependencies means more regression testing.

Conclusion

If you have a very flat architecture with few dependencies, then the TDD payback is longer (if there even is a payback). On the other hand, if you have highly interdependent software (modules built on top of other modules), TDD pays you back quickly.

5 thoughts on “The Payback on Automated Unit Tests”

Jeff M June 2, 2011 at 10:34 am

I agree Scott – I’ve started using TDD where I’m at now, and I can’t imagine ever going back. Not only does it very likely improve quality, but it helps to keep the developer focused on one specific feature, and should prevent the temptation of doing writing too many things all at once (which happens to me), if you go by the book.

Also, there’s the satisfaction of seeing those little green lights turn on, which makes me feel like everything is going to be okay 🙂
Scott Whitlock Post authorJune 4, 2011 at 9:21 am

@Jeff M – Yes, between unit test green lights and ReSharper’s little green squares, there’s lots of bling to keep my OCD at bay for a while, I think. 🙂
Jakob February 29, 2016 at 10:51 am

Hi!

Do you know if there are any unit testing frameworks for TwinCAT3 (like JUnit for Java)?
Scott Whitlock Post authorFebruary 29, 2016 at 11:31 am

@Jakob – there are no frameworks that I’m aware of. However, it’s definitely possible to do TDD with TwinCAT 3. It’s just a matter of how far you want to take it. The most payback will be unit tests for your PLC library projects. In that case, just create a test PLC project, reference the library and start creating one program per test. The program should have a Pass/Fail BOOL output, and ideally a STRING error/failure message output.

What you really want is something like a test runner which runs all the programs and spits out a report of what failed and what the error message was when it failed. TwinCAT 3 doesn’t offer reflection, so test discovery is hard. Manually coding a test runner would be doable, but a bit tedious.

You could write your tests in function blocks instead of programs. If you did that, they could implement an interface. Then you could declare your tests as an array of that interface. Then your test runner is just a loop going through an executing each function block. I think it would work…

Would be worth a blog post, I think.
Alex W August 29, 2017 at 5:55 pm

I’ve been doing unit tests for TwinCAT lately, but the thing that needs to be solved is the continuous integration. How can we automate the deployment and execution of the simulator code when activating a config requires entry of a captcha?