How Many Bugs Are Left? The Software QA Puzzle

Software QA with the Lincoln Index

If software QA began with a specific number of bugs, it might be easier to find all the issues. For example, many puzzle books show you a drawing and ask you to find an exact number of hidden objects. Or, they show you a pair of drawings and ask you to spot a certain number of differences between them.

In software development, the target number of differences — or bugs — isn’t always so specific. How much easier would software QA be if someone could whisper in our ear how many bugs there are to find? In reality, we never know. We only know a minimum: if we’ve found 37 bugs, we know there are at least 37 bugs. Maybe there’s one more to find, or maybe there are hundreds — we can’t be sure.

It would be helpful to have a rough idea of how many bugs remain, so we could best utilize our testing resources. If we always assume there are hundreds of bugs left to find, we may be over-testing one product at the expense of others. If we have reason to suspect we’ve found all the bugs, or nearly all of them, we can redistribute our resources to make the best use of their time and efforts.

Rough Estimates

Nothing can tell you how many bugs are left during software QA, but a little math can give you an estimate. Suppose you have a tester who has found some number of bugs. If you knew not only how efficient she was at finding bugs, but also the probability of her finding each bug during a period of time, you could estimate how many there are to find. But you can’t know what proportion of bugs she’s found without knowing how many there are to find.

Now suppose you have another tester, and he’s also found some number of bugs. You don’t know what proportion of bugs he’s found, either. But by combining the bug counts from the two testers, you can estimate how many bugs there are.

This may seem too good to be true, but it’s not — the key is to consider how many bugs both testers found. Suppose the first tester found 20 bugs and the second found 30, but there was only one bug that both found. You might suspect that there are a lot of bugs to find, since both could find so many with hardly any overlap. On the other hand, if 18 of the bugs on the first list are on the second list, you might feel like your testers have probably found most of them.

The Lincoln Index

You can quantify an estimate with a tool known as the “Lincoln Index.” If the first tester found A bugs, the second found B, and C bugs were in common between them, the estimated total number of bugs would be AB/C. In the first example above, A = 20, B = 30, and C = 1. In this case there would be an estimated 600 total bugs to find. But in the second example above, A = 20, B = 30, and C = 18. In this case there would be an estimated 33 1/3 bugs.

How do you find 1/3 of a bug? You might find 1/3 of a bug in your soup, but you won’t find 1/3 of a software error. The Lincoln Index is a simple mathematical model, so it can’t tell you exactly  how many bugs there are. It makes a couple simplifying assumptions, namely that testers find bugs at random and independently. That won’t be exactly true in practice, which means we should be skeptical of the number it generates. Still, the Lincoln Index gives us an estimate to run with — much better than saying, “I have no idea how many bugs there are.”

Here’s the math behind the Lincoln Index: Suppose there are N bugs total, and the first tester has a probability p of finding each bug. Then she would find around A = Np bugs. If the second tester has a probability q of finding each bug, he would find around B = Nq. And if they find bugs independently, they’d find around C = Npq bugs in common. Then AB/C should be around (Np)(Nq)/(Npq) = N, the number of bugs. The probabilities of each tester finding a bug cancel each other out.

Conclusion

While the software QA process will never be as simple as a “spot the differences” puzzle, for certain types of software development, there are simple tools to help estimate that elusive number of total bugs. Using the Lincoln Index, we can determine the level of overlap between our testing resources, which usually speaks to the proportion of bugs that are left to uncover. With this rough estimate, we can make data-driven decisions about how to best allocate our resources.

John D. Cook

John helps companies make better decisions by taking advantage of the data they have, combining it with expert opinion, creating mathematical models, overcoming computational difficulties, and interpreting the results. Connect with John on his website or on Twitter.