The chance of having a false positive result after 10 yearly mammograms is about 50-60 percent.
–Susan G. Komen Foundation
At the dawn of the 18th century, Joshua Bayes, a nonconformist Presbyterian minister, welcomed his son Thomas into the world. Reverend Bayes’ dissenting views from the official dogma and practice of the Church of England influenced young Thomas to think for himself, challenge prevailing assumptions, and look to underlying principles. Three centuries later, Thomas’ insights on the then-emerging field of probability theory and statistics have experienced a resurgence in relevancy with the explosion of diagnostic testing.
A startling ramification of Bayes’ Theorem that I wish to explore with you is this: a medical screening method can be highly accurate, and yet, even when a test result says we’re ill, chances of being disease-free nevertheless can still be strongly in our favor!
How and when does this paradox arise? I will explain with a hypothetical scenario and follow with a discussion of mammography as an example that touches millions of lives each year.
To set the stage, we need to know that the accuracy of any ‘yes/no’ medical screening test is judged by its sensitivity (how well it identifies those who truly have the condition) and its specificity (how well it properly excludes those who truly do not have the condition). For a medical screening test to be accurate each and every time, it must be both 100 percent sensitive and 100 percent specific.
When sensitivity is less than 100 percent, some ill people will be missed by the test (false negatives). When specificity is less than 100 percent, some healthy people will be incorrectly identified as ill (false positives).
The following scenario elucidates how and when a highly accurate diagnostic test can still be wrong more often than not when it comes to identifying patients who truly have a disease.
Let us suppose that you and I are members of a group of 1,000 people. Two percent of our group (20 individuals) have a curable but life-threatening disease. But we don’t know who. To find out who needs medical intervention, everyone is screened using a diagnostic test that gives correct results nine times out of ten.
On average, what test results will the group get?
Of the 20 who truly have the disease, the tes—tbeing 90 percent sensitive—will identify correctly 18 as ‘true positives.’ That is, the test results say they are ill and they truly are.
Of the remaining 980 fortunate folks who do not have the disease, the test—being 90 percent specific—will generate 98 ‘false positives.’ In other words, the test results will say 10 percent of the healthy members of the group are ill even when they aren’t.
In total, out of the thousand people tested, 116 (11.6 percent) will test positive for the disease: 18 true positives + 98 false positives = 116 total positives.
Now you and I were just told our results and we both got the heart-stopping result that we tested positive. If we can gather our wits about us, among the many things to think about, we ought to consider what are the odds that either of us actually needs medical treatment? Why? Because under this scenario, chances are we don’t! Here’s what I mean.
For those 116 people who received a positive test result, the odds of being disease-free is the ratio between the number of false positives and the number of true positives. These 98:18 odds mean that anyone testing positive still has a very good chance of not have the disease. This happens because for the purposes of identifying only those who truly have the disease, the test results are wrong 84.5 percent of the time (98/116 = 84.5%).
Here are my main takeaway points. The results of a ‘yes/no’ test are not black and white. Pre-test, because we didn’t know which 20 people had the disease, everybody in our group of a thousand had a 2 percent probability of having it. Post-test, the 116 people who tested positive saw their chances of having the disease rise from 2 percent to around 16 percent (18/116 = 15.5%). The remaining members of the group who tested negative saw their chances drop significantly below 2 percent. (Go ahead and calculate their chances. Leave a comment at my article online and I will tell you whether we agree!).
Now on to mammography.
Mammography is the most widely used breast cancer screening method in the U.S. Roughly 18 million mammograms are performed annually, leading to the discovery of a quarter of a million new cases of breast cancer identified each year. Taken together, these data indicate that the annual incidence of new breast cancer cases among adult American women is less than 2 percent.
According to the Centers for Disease Control and Prevention (CDC), “Of those who get screened, 16 percent will get called back for further testing if it’s their first mammogram, and 10 percent will be called after subsequent mammograms. Fortunately, very few of those who are called back will end up having cancer.”
Putting this another way, CDC is saying that two million women each year will receive the difficult news that their mammography indicated an abnormality requiring follow-up testing, even though they are healthy. And as stated by the Komen Foundation, over the course of having 10 mammographs, a woman is more likely than not to receive an upsetting test result even though she has been cancer-free the whole time. These are direct consequences of Bayes’ Theorem.
Similar to the situation in our hypothetical scenario, an initial mammogram that comes up positive increases a women’s probability of having breast cancer from the general baseline range of 1-2 percent to perhaps 10-20 percent depending upon signs, symptoms, medical history, race, age, and the overall clinical picture of the patient.
If you have had a hard time following along with Bayesian reasoning and its implications for interpreting diagnostic test results, you are in good company, so do many physicians.
The Bristol School of Social and Community Medicine in the U.K. has led the first ever systematic review of how well health professionals interpret diagnostic information. Their study concluded that “test accuracy measures including sensitivity and specificity are not well understood” by health professionals. There was little evidence in the literature “of successful application of Bayesian reasoning: most studies suggested that post-test probability estimation is poor with wide variability and a tendency to overestimation for both positive and negative results.”
In conclusion, way too many diagnostic tests are ordered to ‘rule in’ or ‘rule out’ a medical condition. That’s not what many screening tests are capable of doing. What they are useful for is refining probabilities. This is the general principle Thomas Bayes bequeathed to us three centuries ago.
Ideally, before diagnostic tests are ordered, patients and physicians should discuss to what extent a positive or negative test result changes the likelihood of a diagnosis. This ‘post-test’ probability estimation depends upon not only the accuracy of the test and the medical history of the patient, but on the prevalence of the disease being tested as well. As a consequence, doctor-patient conversations become even more important when discussing an affliction found infrequently among the general population. Bayesian reasoning offers a way to do this. Armed with this information, we will make better informed medical decisions.
Life is tough enough as it is. Following the principles of Bayes Theorem, we have an enormous opportunity to avoid unwarranted anxiety and pain in our lives. And in doing so, we will save billions of precious healthcare dollars as we focus them more productively. Let’s up our game.