The Downsides of Excessive Mocks (framework) and Stubs in Unit Testing

A tiny disclaimer: When I say mock, I will be referring to a Mock Test Double. The mocking framework used to create this Test Double can be known as dynamic mock libraries, as defined by Mark Seemann. All code I will show here will be using Kotlin programming language.

A robust and reliable test suite helps us be more confident during refactors, preventing a code breach from going unnoticed; it also could improve the speed of code reviews, where the tests can act like documentation about the happy path, error handling, and edge cases. Lastly, it could highlight design problems if the code is difficult to test. Using mocks and stubs has become standard practice to isolate components and ensure reliable tests (London School). However, it’s imperative to recognize the fragility and potential pitfalls associated with excessive dynamic mock libraries and stub usage.

Clarifying Mocks and Stubs:

Before delving into the drawbacks, let’s clarify the terms mock and stub. A mock is typically created with a mocking framework (e.g., Mockito) and helps emulate and examine interactions between the System Under Test (SUT) and its dependencies. The classic example is to verify that a mock was invoked. On the other hand, stubs assist with interactions between the SUT and its dependencies to provide specific data. Those are functions that return something that we used to assert some condition.

The Fragility of Excessive Dynamic Mocking:

Mocking libraries are used in virtually every unit test in many workplaces, leading to repetitive setup code. This repetitive process of creating mocks and stubs can hinder scalability and create maintenance overhead. Either because it is necessary to create the same configuration for a given dependency in multiple places or because of the maintenance cost of updating all these mocks if something in the API changes.

Moreover, the ease of constructing the SUT’s dependencies using dynamic mocking libraries tempts developers to overlook the importance of a good design. We can imagine having an interface that contains many methods, on which you are interested in using only one - Interface Segregation Principle violation. It is very troublesome to create a working implementation of this interface and pass it in as an argument in the SUT constructor. On the other hand, it’s a breeze to do that with a dynamic mocking library.

Without paying attention, we can end up in a situation where we spend a couple of hundred lines just configuring the dependencies of the SUT, setting up things like:

val someDependency: SomeDependency = mock()
whenever(someDependency.callSomething(any())).doReturn(SomethingElse())

This extensive setup code makes the review process painful, making it difficult for reviewers to grasp the code’s intention. The back-and-forth between test and production files can be time-consuming and inefficient. Unit tests should be as declarative as possible.

Fragile Interaction Testing:

As we discovered, mock helps emulate and examine interactions of a particular dependency and the SUT being exercised. Interaction testing verifies whether specific dependencies were invoked. The fact that the SUT has called a method of its dependency is an implementation detail and, in most cases, should not leak to the test suite. This leakage results in brittle tests that require frequent updates whenever implementation details change, undermining the value of automation. We can write this kind of test using some function from the Mockito library:

verify(someDependency).callSomething(any())

Abusing the type of test - only checking calls without asserting behavior, can lead to a false feeling of completeness, having high test coverage but with low quality since by just verifying the calls, we can’t be sure that the expected result is happening. It is a mere assumption.

If a set of tests needs to be manually tweaked by engineers for each change, calling it an “automated test suite” is a bit of a stretch! (Software Engineering at Google, p.223)

Risks of Stubbing External Functions:

Stubbing functions from external sources that are not owned by us or fully understood can lead to a mismatch between the stubbed behavior and the actual implementation. This practice poses a risk of breaking present or future preconditions, invariants, or postconditions in the external function.

// kotlin
// Example of Stubbing a Function - MyClassTest
class Calculator {
    fun sum(a: Int, b: Int): Int {
        return abs(a + b) // always returns a positive integer
    }
}

//Test file
class MyClassTest {
    private val calc: Calculator = mock()
    private val sut: MyClass = MyClass(calc)

    @Test
    fun test_add() {
        whenever(calc.sum(1, -3)).doReturn(2) // Stubbing the sum function

        val result = sut.getNewValue(1, -3)

        assertEquals(result, 2)
    }
}

The test passes. But by stubbing the function sum, we are forced to duplicate the details of the contract, and there is no way to guarantee that it has or will have fidelity to the actual implementation. Just by reading the signature of the sum method, there is no guarantee that this function always returns positive integers. See more about depending on implicit interface behavior.

Times went by, and the owner of the Calc#sum method decided to change the postcondition of always returning positive integers to now also return negative values. The owner updates their test suite and runs the entire test suite of the project (assuming that all code belongs to the same repository). The worst happened, MyClassTest#test_add still passes! giving a false feeling of safety. If a particular behavior is always expected but not explicitly promised by the contract, you should write a test for it (The Beyoncé Rule).

Conclusion:

Excessive use of mocks and stubs in unit testing can introduce fragility, hinder maintainability, and lead to incomplete test coverage. Awareness of these downsides is crucial for fostering a robust and reliable testing strategy.

At Vinted, we still rely heavily on dynamic mocking libraries to write tests. However, recognizing their fragility is the first step to start thinking and treating tests as first-class citizens.

If the testing culture is an afterthought, the test suite’s quality can be, and most certainly will be, put at risk, providing a false sensation of safety where everything is virtually verified.

A very good quote from Mockito repo: If everything is mocked, are we really testing the production code?

This blog post was inspired by following resources:

  • Software Engineering at Google curated by Titus Winters, Tom Manshreck, and Hyrum Wright
  • Effective Software Testing by Maurício Aniche
  • Unit Testing (Principles, Practices, and Patterns) by Vladimir Khorikov