Experience Report: Exploratory vs Scripted Testing

The case for scripted manual testing

Before I start, the purpose of this story is not to discredit scripted manual testing. Scripted manual testing is less important than it was at time I experienced this project, but it still has a place in the tester's toolbox under certain circumstances, and I hope that this story will illuminate that context.

Long ago when dinosaurs roamed the earth, in the 1990s, I worked testing life-critical telecom software on mainframe computers. At the time, the industry standard for software test planning and execution was the IEEE 829 standard This is a protocol that specifies certain activities and creating certain documents that ultimately guide software testing activity. The Wikipedia article does not mention it, but the scripted tests for IEEE 829 were informed by other standard documents, like requirements and design documents. While this standard was ultimately deprecated and updated, it provides an understandable framework for an approach to testing software that generates a detailed set of steps, informed by available information, to be executed in a particular order, with results to be reported in particular ways.

Around 1997/1998 I had become a senior tester in my organization, and I had an even more senior colleague Angela who knew the software extraordinarily well, and who was a thoughtful and effective tester and test manager. We both read the code and used the debugger often, we both could query the databases and filesystems, we were both technical and highly informed about the software. Angela was a big fan of scripted 829-style testing. I, however, had been reading about this new concept called "Exploratory Testing" and I was discovering that ET could be unusually effective, however unorthodox it seemed at the time. Angela and I had an enormous amount of mutual respect for each other and we discussed these ideas often.

And then we got a chance to test our ideas. We had a big new feature come along, with really excellent requirements documents and design documents, and a generous amount of time in the schedule for testing before the scheduled release date. Both of us had unrestricted access to the requirements docs, the design docs, the code, and the test environment. We also had a pretty sophisticated bug tracking system in place.

We decided that Angela would create an 829-style test plan with test cases based on the documentation, while I would take an Exploratory Testing approach to the project based on the same documentation. Angela would execute the tests in order top to bottom, while I would start exploratory testing according to my perception of importance based on the documentation at hand. We would measure the number of defects found as well as the rate at which we found them.

While my story is anecdotal, it is thorough, and to this day I have never seen a similar study. Our results informed our work for years afterward.

The most striking differences between an ET approach and a scripted approach had to do with rate of bug discovery; with severity of bugs discovered, and with thoroughness of bugs discovered.

Using ET techniques I discovered many more bugs and many more high-severity bugs much more quickly than did the scripted approach. ET paid off hugely early in the project, as we discovered important issues much more quickly than we would have just by following the scripted tests. The ET bug discovery rate early on was much much higher than the scripted approach, as was the severity of the bugs discovered.

But the rate of bug discovery of the scripted approached remained constant while the rate of bug discovery by ET dropped over the course of the project. Toward the end of the time we had for testing, my ET work was discovering approximately zero meaningful problems, while the scripted tests continued to identify subtle issues until the very end of the project. (The project was successful by the way, with more or less no bugs in production.)

In conclusion, we discovered empirically that ET uncovers more important bugs more quickly than a scripted testing approach. For the rest of my time at that company I would tend to lead testing on projects where test time was limited, or where thoroughness of testing was less critical, while Angela would tend to lead testing on larger-scale projects with longer windows for testing. It worked out well for us.

But this was the 1990s. Today we can change production software in minutes or seconds instead of months or years. Our customers have an insatiable appetite for new features. Continuous Integration, Continuous Delivery, and automated testing at every level of the software has revolutionized software delivery in the past couple of decades. With turnaround times for most software features so short, ET is a clear winner over scripted test plans and test suites. There simply isn't the time or money available to make a large scale scripted test effort worthwhile.

Except for the cases where the effort is worthwhile. From time to time experienced testers may discover that it is in fact worthwhile to get human eyeballs on a large set of features of a software system, in a certain order, to confirm that those features continue to work as expected.

On several occasions later in my career I have been in a position to test systems that are highly visual, whose value relies on looking good to human eyeballs in multiple contexts, where a visual error could cost goodwill and ultimately lose customers and lose revenue. That is one example where a scripted manual regression test tour with a strong emphasis on how things look, and how transitions look, is critically important and cannot be automated in any meaningful way. Others may exist.

The experienced software tester has multiple tools in their toolbox. Just because 829-style scripted manual testing is old and obsolete does not necessarily mean that such an approach is not valuable. You just have know when and why to use it.

Reply

or to participate.