How Many Bugs Are Left? The Software QA Puzzle

Software QA with the Lincoln Index

If software QA began with a specific number of bugs, it might be easier to find all the issues. For example, many puzzle books show you a drawing and ask you to find an exact number of hidden objects. Or, they show you a pair of drawings and ask you to spot a certain number of differences between them.

In software development, the target number of differences — or bugs — isn’t always so specific. How much easier would software QA be if someone could whisper in our ear how many bugs there are to find? In reality, we never know. We only know a minimum: if we’ve found 37 bugs, we know there are at least 37 bugs. Maybe there’s one more to find, or maybe there are hundreds — we can’t be sure.

It would be helpful to have a rough idea of how many bugs remain, so we could best utilize our testing resources. If we always assume there are hundreds of bugs left to find, we may be over-testing one product at the expense of others. If we have reason to suspect we’ve found all the bugs, or nearly all of them, we can redistribute our resources to make the best use of their time and efforts.

Rough Estimates

Nothing can tell you how many bugs are left during software QA, but a little math can give you an estimate. Suppose you have a tester who has found some number of bugs. If you knew not only how efficient she was at finding bugs, but also the probability of her finding each bug during a period of time, you could estimate how many there are to find. But you can’t know what proportion of bugs she’s found without knowing how many there are to find.

Now suppose you have another tester, and he’s also found some number of bugs. You don’t know what proportion of bugs he’s found, either. But by combining the bug counts from the two testers, you can estimate how many bugs there are.

This may seem too good to be true, but it’s not — the key is to consider how many bugs both testers found. Suppose the first tester found 20 bugs and the second found 30, but there was only one bug that both found. You might suspect that there are a lot of bugs to find, since both could find so many with hardly any overlap. On the other hand, if 18 of the bugs on the first list are on the second list, you might feel like your testers have probably found most of them.

The Lincoln Index

You can quantify an estimate with a tool known as the “Lincoln Index.” If the first tester found A bugs, the second found B, and C bugs were in common between them, the estimated total number of bugs would be AB/C. In the first example above, A = 20, B = 30, and C = 1. In this case there would be an estimated 600 total bugs to find. But in the second example above, A = 20, B = 30, and C = 18. In this case there would be an estimated 33 1/3 bugs.

How do you find 1/3 of a bug? You might find 1/3 of a bug in your soup, but you won’t find 1/3 of a software error. The Lincoln Index is a simple mathematical model, so it can’t tell you exactly  how many bugs there are. It makes a couple simplifying assumptions, namely that testers find bugs at random and independently. That won’t be exactly true in practice, which means we should be skeptical of the number it generates. Still, the Lincoln Index gives us an estimate to run with — much better than saying, “I have no idea how many bugs there are.”

Here’s the math behind the Lincoln Index: Suppose there are N bugs total, and the first tester has a probability p of finding each bug. Then she would find around A = Np bugs. If the second tester has a probability q of finding each bug, he would find around B = Nq. And if they find bugs independently, they’d find around C = Npq bugs in common. Then AB/C should be around (Np)(Nq)/(Npq) = N, the number of bugs. The probabilities of each tester finding a bug cancel each other out.


While the software QA process will never be as simple as a “spot the differences” puzzle, for certain types of software development, there are simple tools to help estimate that elusive number of total bugs. Using the Lincoln Index, we can determine the level of overlap between our testing resources, which usually speaks to the proportion of bugs that are left to uncover. With this rough estimate, we can make data-driven decisions about how to best allocate our resources.

John D. Cook

John helps companies make better decisions by taking advantage of the data they have, combining it with expert opinion, creating mathematical models, overcoming computational difficulties, and interpreting the results. Connect with John on his website or on Twitter.

9 thoughts on “How Many Bugs Are Left? The Software QA Puzzle

  1. I think the independence assumption is unlikely to hold in most cases. There will be rare paths that neither of them will test, and common paths that both are likely to test.

  2. I agree that testers won’t be completely independent. Even so, the method gives you some idea what to expect. You might try calibrating it for your experience. Maybe it consistently underestimates the number of bugs by a factor of 2, for example. If so, then double the estimate!

  3. I applaud the effort to quantify bugs remaining but I do feel this is in fact too good to be true. Here’s why. Rarely, if ever, would a test lead have two testers testing the same identical User Story due to efficiency and timing in real-world projects. A lead will plan for testers on a team to divide and conquer the work items brought into a sprint. That means that Tester A and B are testing independent items and the overlap C will be be very small if not zero. Since the User Stories are implemented in independent modules of code by different developers, the Lincoln Index won’t help. That said, IF (big if) time were to permit, having Testers A and B swap and test each other’s User Stories, we might be able to get something meaningful out of the index proposed, but unfortunately that would double the delivery time and there is small management appetite for such an academic exercise in practice. Even the calibration suggestion falls short since we never will know what factor we’re off by because we never really know the number of bugs in any application complex enough to be of any value.

Leave a Reply