Building Real Software: War Games, Pair Testing and Other Fun Ways to Find Bugs

I've already examined how important good testing is to the health of a project, a product and an organization. There’s a lot more to good testing than running an automated test suite in Continuous Integration and forcing someone to walk through functional test scripts and check lists. A good tester will spend time exploring the app, making sure that they really understand it and that the app actually makes sense, finding soft spots and poking them to uncover problems that nobody expects, providing valuable information and feedback to the team.

What’s better than a good tester? Two good testers working together…

Pair Testing – Two Heads are Better than One

Pair Testing is an exploratory testing approach where two testers work through scenarios together, combining their knowledge of the app and their unique skills and experience to duplicate hard-to-find bugs or to do especially deep testing of some part of a system. Like in pair programming, one person drives, defining the goals of the testing session, the time limit and the starting scenarios and providing the hands at the keyboard; and the other person navigates, observes, takes notes, advises, asks questions, double checks, challenges and causes trouble. As a pair they can help each other through misunderstandings and blocks, build on each other’s ideas to come up with new variations and more ways to attack the app, push each other to find more problems, and together they have a better chance of noticing small inconsistencies and errors that the other person might not consider important.

Pair testing can be especially effective if you pair developers and testers together – a good tester knows where to look for problems and how to break software; a good developer can use their understanding of the code and design to suggest alternative scenarios and variations, and together they can help each other recognize inconsistencies and identify unexpected behaviour. This is not just a good way to track down bugs – it’s also a good way for people to learn from each other about the app and about testing in general. In our team, developers and testers regularly pair up to review and test hard problems together, like validating changes to complex business rules or operational testing of distributed failover and recovery scenarios.

Pair testing, especially pairing developers and testers together, is a mature team practice. You need testers and developers who are confident and comfortable working together, who trust and respect each other, who understand the value and purpose of exploratory testing, and who are all willing to put the time in to do a good job.

War Games and Team Testing

If two heads are better than one, then what about four heads, or eight, or ten or …?

You can get more perspectives and create more chances to learn by running War Games: team testing sessions which put a bunch of people together and try to get as close as possible to recreating real-life conditions. In team testing, one person defines the goals, roles, time limit and main scenarios. Multiple people end up driving, each playing different roles or assuming different personas, some people trying crazy shit to see what happens, others being more disciplined, while somebody else shoulder surfs or looks through logs and code as people find problems. More people means more variations and more chances to create unexpected situations, more eyes to look out for inconsistencies and finishing details (“is the system supposed to do this when I do that?”), and more hands to try the same steps at the same time to test for concurrency problems. At worst, you’ll have a bunch of monkeys bashing at keyboards and maybe finding some bugs. But a well-run team test session is a beautiful thing, where people feed on each other’s findings and ideas and improvise in a loosely structured way, like a jazz ensemble.

Testing this way makes a lot of sense for interactive systems like online games, social networks, online stores or online trading: apps that support different kinds of users playing different roles with different configurations and different navigation options that can lead to many different paths through the app and many different experiences.

With so many people doing so many things, it’s important that everyone (or at least someone) has the discipline to keep track of what they are doing, and make notes as they find problems. But even if people are keeping decent notes, sometimes all that you really know is that somebody found a problem, but nobody is sure what exactly they were doing at the time or what the steps are to reproduce the problem. It can be like finding a problem in production, so you need to use similar troubleshooting techniques, rely more on logs and error files to help retrace steps.

Team testing can be done in large groups, sometimes even as part of acceptance testing or field testing with customers. But there are diminishing returns: as more people get involved, it’s harder to keep everyone motivated and focused, and harder to understand and deal with the results. We used to invite the entire team into team testing sessions, to get as many eyes as possible on problems, and to give everyone an opportunity to see the system working as a whole (which is important when you are still building it, and everyone has been focused on their pieces).

But now we've found that a team as small as four to six people who really understand the system is usually enough, better than two people, and much more efficient than ten, or a hundred. You need enough people to create and explore enough options, but a small enough group that everyone can still work closely together and stay engaged.

Team testing is another mature team practice: you need people who trust each other and are comfortable working together, who are reasonably disciplined, who understand exploratory testing and who like finding bugs.

Let's Play a Game

We relied on War Games a lot when we were first building the system, before we had good automated testing coverage in place. It was an inefficient, but effective way to increase code coverage and find good bugs before our customers did.

We still rely on War Games today, but now it’s about looking for real-life bugs: testing at the edges, testing weird combinations and workflow chaining problems, looking closely for usability and finishing issues, forcing errors, finding setup and configuration mistakes, and hunting down timing errors and races and locking problems.

Team testing is one of the most useful ways to find subtle (and not so subtle) bugs and to build confidence in our software development and testing practices. Everyone is surprised, and sometimes disappointed, by the kinds of problems that can be found this way, even after our other testing and reviews have been done. This kind of testing is not just about finding bugs that need to be fixed: it points out areas where we need to improve, and raises alarms if too many – or any scary – problems are found.

This is because War Games only make sense in later stages of development, once you have enough of a working system together to do real system testing, and after you have already done your basic functional testing and regression. It’s expensive to get multiple people together, to set up the system for a group of people to test, to define the roles and scenarios, and then to run the test sessions and review the results – you don’t want to waste everyone’s time finding basic functional bugs or regressions that should have and could have been picked up earlier. So whatever you do find should be a (not-so-nice) surprise.

War Games can also be exhausting – good exploratory testing like this is only effective if everyone is intensely involved, it takes energy and commitment. This isn’t something that we do every week or even every iteration. We do it when somebody (a developer or a tester or a manager) recognizes that we’ve changed something important in workflow or the architecture or business rules; or decides that it’s time, because we’ve made enough small changes and fixes over enough iterations or because we’ve seen some funny bugs in production recently, time to run through key scenarios together as a group and see what we can find.

What makes War Games work is that they are games: an intensity and competition builds naturally when you get smart people working together on a problem, and a sense of play.

“Framing something like software testing in terms of gaming, and borrowing some of their ideas and mechanics, applying them and experimenting can be incredibly worthwhile.”
Jonathan Kohl, Applying Gamification to Software Testing

When people realize that it’s fun to find more bugs and better bugs than the other people on the team, they push each other to try harder, which leads to smarter and better testing, and to everyone learning more about the system. It’s a game, and it can be fun – but it’s serious business too.

2 comments:

Jim Bird said...: @Casey, excellent point about bug bounties. I didn't think about bug bounties in the context of a multi-player game, but the same ideas apply: involving more people with more and different perspectives, creating competition, building on each other's results.; April 4, 2013 at 8:22 AM
Jonathan Kohl said...: Cool post Jim, I'm glad some of my work piqued your interest.

Another angle is co-operative games. Healthy competition can be good, but it needs to be managed well. Co-operative games have enormous potential to mobilize people towards a common goal, and feel like they are working at something larger than themselves, which translates into better engagement and more productivity.

My current focus is in co-operative games and ARGs (alternate reality games.) There is a lot of compelling information out there to apply to software testing and software development.

-Jonathan; April 4, 2013 at 8:46 AM