Byzantine Testing

I participated in my first code retreat yesterday. It was fun! Our group spent six sessions pairing on implementing Conway’s Game of Life, using a different set of constraints each time to guide our development. In one session we did ping-pong pairing, in another we couldn’t pass primitives as method arguments (except to constructors), and in a third we had to swap laptops with another group halfway through and finish their implementation.

One session in particular stood out to me. It’s good practice in TDD to only write enough code to make the test pass. That often involves temporarily writing a deliberately oversimplified implementation.

For example, if we wanted to implement a factorial function, we might start by writing a test for the base case, like:

it "handles the base case of 0! == 1" do
  expect(factorial(0)).to eq(1)
end

To make this test pass we might choose to implement a full factorial function. That’d be writing code that we don’t need, though, so instead we’d start with:

def factorial(_)
  1
end

Next we’d write a few more test cases, eventually ending up with a full implementation of the factorial function.

In this session, we divided each pair into a tester and an implementer. The implementer was instructed to be “evil” in a lazy sort of way: to do the bare minimum to make the tests pass, possibly by acting in ways that the implementer wouldn’t expect.¹ Meanwhile, the tester wasn’t allowed to see the implementer’s code, and had to test a black-box implementation.

This meant that the tester still had the power to define the interface of the system, but that that interface had to be so exhaustively defined that even an adversarial implementation would be forced to provide the right solution (given the constraint of not making the “lazy/evil” code more complex than the correct code would be).

I thought that was a really interesting idea! It reminded me of Byzantine failure, a distributed systems failure mode in which we assume that some computers are so buggy as to be actively hostile.

After conducting a cursory Web search I determined that no one has united these ideas yet, so I’m going to refer to this style of adversarial TDD as Byzantine testing. To perform Byzantine testing, we should assume that:

The implementation will be so buggy as to be malicious,
Our language, testing framework, and external libraries not under test still work as we’d expect, and
The implementation is more lazy than it is hostile. This implies that when you’re testing, you should assume that there won’t (for example) be a huge switch statement in the implementation checking only your test cases, unless that’s genuinely simpler than a correct solution would be.

I’m not sure that this is actually a useful concept in practice, since it seems like overkill in most cases, but I think it’s inherently valuable to assign names to ideas. I might even try to write some real code in this style in the future, too, just to try it out!

For example, my pair, the tester, compared two sets of (Python) objects we were defining. My lazy implementation was to redefine the __hash__ method to always return True, which sure made the tests pass. ↩

You might like these textually similar articles: