This is the transcript for Test & Code, episode 23: Lessons about testing and TDD from Kent Beck
[music] Welcome to Test and Code, a podcast about software development and software testing. Kent Beck’s Twitter profile says “programmer, author, father, husband and goat farmer”, but I know him best for his work on Extreme Programming, Test First Programming, and Test Driven Development. He’s the one. The reason you know about TDD is because of Kent Beck. I first ran across his writings when I was studying Extreme Programming in the early 2000’s. Although I don’t agree with all of the views he expressed on his long and verbose career, I respect him as one of the best sources of information about software development, engineering practices and software testing. Along with Test First Programming and Test Driven Development, Kent started an automated test framework that turned into JUnit. JUnit, and its model of setup and teardown, wrapping test functions, as well as base test class-driven test frameworks, became what we know of as xUnit-style frameworks now, which includes Python’s unittest. He discussed this history and a lot more on episode 122 of Software Engineering Radio. The episode is titled “The history of JUnit and the future of testing with Kent Beck”, and is from September 26, 2010. I’ll put a link in the show notes. I urge you to download it, and listen to the whole thing. It’s a great interview, still relevant and applicable to testing in any language, including Python. However, I know that many of you aren’t going to listen to it, and there’s a few portions of the interview that I really want to share with you. So here’s what I did. I tracked down the right people to ask permission to pull some clips out of that interview and play on this podcast. They said it was OK. Actually, since SE Radio is part of IEEE now, my request ended up going to someone at IEEE Computer Society. They said yes, which is cool, so here we are. Oh yeah, I did ask Kent via Twitter if it was OK if I introduced him as a goat farmer from Oregon. His reply: “that’s fine”. So here we go, some bits of software testing wisdom from a goat farmer in Oregon.
[2:22] The first clip is about having your tests be readable and tell a story.
“I always strive for a kind of declarative expression in my tests. You should be able to just kind of read a test and it tells a story. That is, somebody coming along later and reading it should be able to understand something important about the program. ”
Sometimes, normal programming good practices don’t apply to software tests. One example is DRY. DRY stands for don’t repeat yourself, and many people take it to mean that if you have any repeated chunks of code, you should put those into a function and call that function instead. Software tests naturally have code similar to other tests, and it’s tempting to put the common lines in a separate function. Here’s Kent on the topic:
“DRY in particular I don’t subscribe to for test code, because I want my tests to read like a story.”
[3:21] Another thing that Kent brought up was the idea that tests should advance your knowledge of the software under test. A test that fails should legitimately tell you new information about the problem in your software. He also warns against having multiple tests that tell you the same thing about your software.
“Tests should have, in medicine they call it differential diagnosis, where they say I’m going to order this test, and based on the results of this, you know, whatever, blood test, I will be able to rule out a bunch of stuff and confirm some other things. So, every test should have this kind of, maybe this is an information theory thing, should be able to differentiate good programs from bad programs. If you have a test and it doesn’t do anything to advance your understanding of good programs and bad programs, then that’s probably a useless test. But if you took the space of all possible programs to solve your problem, you know, almost all of them won’t, and a few of them will. A test should lop off a big portion of that space and say nope, any program that doesn’t satisfy this test is definitely not going to solve the real problem. So, there’s a part of that. And then there’s a sense of redundancy. If you have a bunch of tests that tell you exactly the same thing, then, I would look to see which of them adds the least value and delete them. But they have to really cover exactly the same cases.”
[5:06] This next clip is one of my favourites. You see, when I first learned about Test First Programming and Test Driven Development, I understood it to be useful at the user API level, with an idea of functional units. I also found it very useful to write tests at layer interfaces, especially when I was working on a layer closer to the hardware and I wanted to test, from my level down, functionality that was ready in the hardware but didn’t have upper layers ready yet, and no API available yet. I think the level, the interface where you apply your tests, is a pragmatic decision based on the circumstances you’re in. But that’s not how a lot of people saw it. A lot of TDD proponents, other than Kent, came around and pushed isolated unit tests, and tried to shove end-to-end tests and system tests back over the fence to QA teams. So I’m very pleased to hear Kent talk about testing at different levels, or as he puts it, different scales.
“Something I didn’t communicate very effectively in my first discussions of TDD is the importance of testing at various scales. So, TDD is not a unit testing philosophy. I write tests at whatever scale I need them to be to help me make my next step of progress. So sometimes they’re what somebody else would call a functional test. So, for example, forty percent of the JUnit tests work through the public API. Sixty percent of them are working on lower level objects. The public API is quite good for testing, probably because we’ve written so many tests, so I don’t know if those proportions are –I don’t want to claim those proportions are anything more than one data point, like should you have 40-60, should you have 10-90 or 90-10, I really don’t know, but just this idea of moving –Part of the skill of TDD is learning to move between scales, right. So I write a test that my customer says “oh, this scenario should result in a five”. So you write a test that says this scenario should result in a five, and then you’re down deep in the intestines of your program and you’re thinking, oh, I see, well this object when given a five and a seven should return the five. Well that’s a good place to write a test because that’s another piece of the story that needs to be told. But, you know, is that Acceptance Test Driven Development, or is that BDD? I think that erecting rigid walls between the styles is actually a mistake, like the scales, as a programmer I want to understand all those scales. Tests help me understand, so I write tests at all those scales.”
[8:18] So let’s say you have tests in place that give you information about your system and tell a story well. The tests are software, and have to be maintained. You shouldn’t have tests in your system that are hard to understand, because, at some point that test will fail, and someone will have to figure out why it’s failing. That’s where readability and value are very important. I’m totally sick of people saying that end-to-end tests are fragile, meaning they break all the time. Listen, if you write a test using your user-facing API, even if it’s a long story, it’s kind of like something your customers are going to do with your software. If it breaks, or fails, that’s your customers’ code that will break too. That’s serious! If it really is a test problem, then that’s just weird, but I have to say, end-to-end tests don’t have to be long stories. Focused functional tests can be short. But sometimes it has to be long to match a real customer use model. So be it, it’s long. But if it fails, take it as seriously as you would a customer defect report. Here’s Kent on the topic.
“I still go places and people say “oh yeah, we did a bunch of tests, but then the tests stopped working, so we threw them out”, which just seems bizarre to me. I mean, like, Aristotle would be shocked. The logic just doesn’t add up. This test said if the test is running my program is running, and if the test is not running then my program’s not running. And the test stops running, and your next act is to delete the test, or just stop running it or ignore the test report that you get. Like, wow, that means your program’s not running. But somehow, I mean, there’s a lot of other pressures on people other than get your program running. I guess that’s the conclusion that I can draw from that, but it’s kind of, it’s too bad, I think. There’s potential value there, people could produce more value as programmers if they trusted the tests, and paid more attention to them. But I, you know, there’s a lot of other things going on in software development than coding.”
[10:32] So far, I agree with everything I’ve played. I thought it might be fair to play a clip that I don’t agree with. I’ll just play it and discuss it afterwards.
“I think it’s worth being dogmatic as a learning tool, right. What if I just said I’m always going to write tests for everything. And then you discover, oh, I’m glad I did this. Here I’m sorry that I did it. So I won’t do it in — what’s the commonality in the experiences where I wished that I hadn’t written tests, what’s the commonality in the experiences where I’m glad I wrote the tests, then let me infer –I’ll use that to inform my behaviour going forward.”
He’s saying that it’s OK to teach TDD in a dogmatic way, and that people will learn with these training wheels on, and when they outgrow the dogmatism, they’ll let their common sense dictate how much they should test and what needs to be tested and what doesn’t. But, I think history tells us you can’t always rely on people’s common sense to kick in, and there’s a bunch of people out there saying things like “you’re not really doing TDD right”, “that’s not a unit test because you aren’t using mocks”, “unit tests shouldn’t touch the database or the file system”, “that’s not really Scrum, it’ ScrumBut”, and stuff like that. Anyway, I’m sick of the dogmatism and the excuse that people are smart enough to know we don’t really mean test everything. We should teach people what they really ought to do, not some idealised version that they are supposed to just know to change when they get the hang of it.
[12:07] Anyway, this is six years later, and I’d love to get Kent Beck in an interview sometime, and ask him about these clips, and about goat farming. And I’m really curious if he’s loopy about IPAs and pinot noir, like half the rest of Oregon. It’s unreal, especially in the summer, you’d be amazed how hard it is to find a beer that isn’t an IPA, or one of its variants. So what did we cover?
- Your tests should tell a story.
- Be careful of DRY, inheritance, and other software development practices that might get in the way of keeping your tests easy to understand.
- All tests should help differentiate good programs from bad programs, and not be redundant.
- Test at multiple levels and multiple scopes, where it makes sense.
- Differentiating between TDD, BDD, ATTD et cetera, isn’t as important as testing your software to learn about it. Who cares what you call it?
[13:11] But there’s lots more great stuff in that interview. Please check it out. Show notes can be found at pythontesting.net/23. This episode was brought to you by Patreon supporters. Visit pythontesting.net/support for more info, or go directly to patreon.com/testpodcast, and help keep the show coming. On Twitter, I’m @brianokken, and the show is @testpodcast. Thanks for listening. [music]