Test::Class Hierarchy Is an Antipattern

Test::Class Hierarchy Is an Antipattern

Test::Class is particularly good at testing object-oriented code, or so it is said. You can create a hierarchy of test classes that mirrors the hierarchy of classes under test. But this pattern, common in Perl projects, is conspicuously missing from the rest of the xUnit world, and with good reason.

This essay was formerly serialized as a trilogy of posts on The Perl Shop Blog.


What is Test Hierarchy, and What’s Wrong With It?

We’ve all heard of it.

Our project has a class Animal that implements method move() (because all animals can move). This class has a test class AnimalTest that derives from Test::Class and has a test for $animal->move().

So far so good.

(Click on any of the embedded images to zoom in.)

Our project also has a class Bat that derives from Mammal, which derives from Animal, and implements method fly(). So we create a test class BatTest that derives from MammalTest, which in turn derives from AnimalTest, and has a test for $bat->fly(). That means that BatTest not only exercises all the behavior of Bat but also of Mammal and Animal, because it inherits all the tests in its ancestors.

Wow! What a cool feature! We get all that testing functionality essentially for free by inheritance!

And if the foregoing description sounded confusing, just imagine how good it’s going to get as we extend the object hierarchy.

Repeat for umpteen different classes.

This arrangement of test classes is what I mean by Test::Class Hierarchy, or more generally, Test Hierarchy.

Multiple prominent sources in the Perl community recommend Test Hierarchy.

And most notably, when I ask a roomful of Perl developers about their experiences with Test::Class, I’m sure to hear at least one person complain about “the rabbit hole of inheritance,” as jnap once put it in a conversation. Of course, not every project misuses Test::Class and its support for inheritance, but as he noted, “I’ve just seen it so wildly abused.” (And to be fair, this is not the only abuse of Test::Class, but it’s the pattern I’m examining at the moment.)

  • This makes fragile, overlapping tests. When we inherit test methods in this way, we end up with test action at a distance. That is, each test class includes tests that are defined in its superclasses, which are completely different modules. A change to any of the superclasses can produce failures in any and all of the subclass tests.
  • These tests are obscure. We can’t know by looking at the test module what functionality it’s testing, at least not without following the inheritance hierarchy all the way to the top.
  • And they’re slow to boot. The test suite takes exponentially longer to run than it needs to, because the same tests are being run over and over again in each subclass.

In the words of Alyssa Mastromonaco (or maybe her publisher): Who thought this was a good idea?

Enough Perl projects use Test Hierarchy that it pops up in criticisms of Test::Class itself, and the negative effects are a significant point when they do.

It bears noting, however, that this practice is strongly discouraged in the rest of the programming world. It’s so rare, in fact, that Gerard Meszaros doesn’t even mention it in his book xUnit Test Patterns.

Rather, the recommended practice is to inherit our test classes directly from Test::Class (or possibly from a project-or subsystem-specific test base class—but that’s a different post). In general, we use as little hierarchy as possible, and whatever hierarchy we do use is organized according to the needs of the tests, not the needs of the system under test. And we never inherit test methods (although we may inherit setup and teardown code).

In summary, we write independent test classes, and we never inherit test methods.

That qualifies Test Hierarchy as a Perl antipattern:

  1. It’s a commonly used structure that despite initially appearing to be appropriate and effective, has more bad consequences than good; and
  2. Another solution exists that is documented, repeatable, and proven to be effective.

Test Hierarchy Produces Poor Unit Tests

A unit test, by definition, tests a unit of software, no more, no less. On the one hand, we have unit tests, which test a single module or class. On the other hand, we have integration tests, which test how multiple modules or classes work together. We want each unit test to poke and prod only the class that it tests. We want each subsystem integration test to test a natural subsystem, e.g., the data-export subsystem. We want our system tests to test the whole system. And we don’t want any test to be affected by any other other units, subsystems, or systems.

When our software depends on other software that may change over time, our tests may suddenly start failing because the behavior of the other software has changed. This problem, which is called Context Sensitivity, is a form of Fragile Test

Whatever application, component, class, or method we are testing, we should strive to isolate it as much as possible from all other parts of the software that we choose not to test. This isolation of elements allows us to Test Concerns Separately and allows us to Keep Tests Independent of one another. It also helps us create a Robust Test by reducing the likelihood of Context Sensitivity caused by too much coupling between our SUT [system under test] and the software that surrounds it. (xUnit Test Patterns: Refactoring Test Code. Gerard Meszaros. Addison-Wesley Professional, 2007.)

If terms like Context Sensitivity and Fragile Test feel familiar, it’s not just a coincidence.

Test Hierarchy produces tests that purport to be unit tests but that don’t actually test isolated units.

Let’s say we have a Bat class that is a subclass of Mammal. If the tests use Test Hierarchy, then BatTest not only tests the Bat code, but also the superclass Mammal code and its superclass Animal.

This seems to make some intuitive sense, because after all, Bat can do all the things that Mammal and Animal can do, all the methods that it inherits from those classes. But this intuition misses an important distinction. The unit is whatever code is in the Bat.pm module, not whatever the Bat class can do.[1]

When a Bat unit test fails, it should indicate that we made a mistake in Bat.pm, not any other module.

Our bad BatTest doesn’t just test the code in Bat.pm, but also the code in Mammal.pm and Animal.pm. This makes it an integration test (not a unit test), because it doesn’t just test its own module but other modules as well.

And it’s an integration test we don’t need. Generally, we write integration tests that exercise some system feature, like “export Foo data in CSV format.” This might involve setting up the Foo data fixture, invoking the appropriate export feature, then validating the CSV file that it generates. But we don’t need module tests that invoke low-level methods on other modules.

In fact, who said that there’s only one test per class?

Here at The Perl Shop, we generally create a separate test module per method or feature. So we’d create move.t, eat.t, and breathe.t, each of which tests a different Animal method. This way, we can group together tests by class method, and easily self-document which tests correspond with which feature.[2]

We also inline our test classes in our .t scripts, which keeps the test code close to the test script and cuts in half the number of files we need to maintain. And makes it impossible to subclass them.

We’ve had great success with these practices, and they’re fundamentally incompatible with Test Hierarchy.

You might have also noticed this in Testing Strategies for Modern Perl. In chapter 2, we create a TicTacToe::BusinessLogic::Game class, which is tested by new.t, board.t, and move.t.

Why Programmers Use the Test Hierarchy Antipattern

I think there are a couple reasons why programmers use Test Hierarchy.

Test Hierarchy may appear to “just make sense” at first blush. After all, you have a hierarchy of classes under test—superclasses and subclasses—and you have a collection of test classes. It seems very symmetrical to have the test classes mirror the classes under test.

However, there’s no design justification for the test classes to be arranged in a parallel hierarchy. The easiest way to see this is to consider what happens when developers get tired of Test Hierarchy. What do they do? They drop back to procedural tests, with no inheritance at all. If you don’t need Test Hierarchy to test object-oriented code using procedural tests, why do you need it when using Test::Class? Answer: You don’t.

Test inheritance should only be used to meet the needs of the tests, not the needs of the code under test.

This usually means that if we have test superclasses, they specifically contain shared setup and teardown code or test utility functions. See the Testcase Superclass pattern, by which a test class can inherit common functionality from an abstract test superclass. Using this pattern, the test hierarchy is organized in order to share common code across the entire project’s tests or an entire subsystem’s tests.

(But use SharedTestModule qw(shared_function) is still preferred over inheritance, because it more explicitly states what is being shared and where.)

I’ve also seen programmers appeal to the Liskov Substitution Principle. This is the idea that if Bat is a subclass of Mammal, then any code that requires a Mammal can be handed a Bat without ill effects. Barbara Liskov and Jeanette Wing formally defined it like this:

Let ϕ(x) be a property provable about objects x of type T. Then ϕ(y) should be true for objects y of type S where S is a subtype of T. (“Behavioural Subtyping Using Invariants and Constraints.” Barbara Liskov, Jeanette Wing. CMU-CS-99-156. MIT Lab, July 1999.)

In other words, a subclass adheres to the same interface contract as its superclasses.

Some programmers will say that if Bat is a subclass of Mammal, then a Bat can do anything that a Mammal can. In other words, a Bat “is a” Mammal. Therefore, it directly follows that BatTest should test all the Mammal behaviors that Bat inherits.

No, it doesn’t, and no, it shouldn’t.

To understand why, consider a couple simple cases.

  • If a Mammal has a care_for_young() method, then every Bat must also be able to care_for_young(). This does not mean that the way bats care for their young is exactly the same as every other mammal. In fact, it’s distinctly different, because baby bats have needs that are distinct from the needs of other baby mammals. In fact, Mammal::care_for_young() may even be an abstract method that dies with a “must be implemented in subclass” error.

  • Similarly, every Animal can move(). That means every Mammal also can move(), because Mammal is a subclass of Animal. By extension every Bat can also move(), because Bat is a subclass of Mammal. Now explain to me how the way a bat moves is identical to the way a sloth moves or the way a tarantula moves. It isn’t.

A subclass adheres to the same interface contract as its superclasses. It does not necessarily implement identical behaviors.

Therefore, just because a Bat “is a” Mammal, that doesn’t mean that a BatTest “is a” MammalTest. Actually, no, a BatTest is not a MammalTest. Not even close. Both BatTest and MammalTest are just tests. Or they might, at most, be derived from OrganismTest abstract class which contains helper methods to set up and manage test fixtures common to all organisms.

Test::Class can test anything straight Test::More can, and vice-versa. The power in Test::Class is not as legend says in testing object-oriented code. The power Test::Class brings is its ability to collect related test methods together, run them independently, and inherit setup and teardown. (I’ll explore more of these details in Testing Strategies for Modern Perl.) Each test class should be derived directly from Test::Class or from an abstract subclass thereof. Never inherit test methods. Just don’t do it.


[1] Formally, Bat.pm doesn’t directly define the Bat class. Rather, it defines an implicit “subclass mixin”—all the methods and attributes in Bat.pm that are added to (“composed with“) its superclass Mammal. The object system, then, composes the Bat mixin with Mammal to create the full Bat class. Similarly, Mammal.pm conceptually defines a Mammal mixin that is composed with its superclass Animal in order to create the Mammal class.

[2] Another alternative is to have a separate test class per fixture, which is useful if different test methods have different fixture requirements.