If you’re feeling sick and you want to know what’s wrong with you, there’s an app for that. But the diagnosis won’t be as accurate as the one you’d get from a doctor – not by a long shot.
In a head-to-head comparison, real human physicians outperformed a collection of 23 symptom-checker apps and websites by a margin of more than 2 to 1, according to a report published Monday in the journal JAMA Internal Medicine.
Even when the contestants got three chances to figure out what ailed a hypothetical patient, the diagnostic software lagged far behind actual doctors. Indeed, the apps and websites suggested the right diagnosis only slightly more than half of the time, the report says.
The research team – from Harvard Medical School, Brigham & Women’s Hospital in Boston and the Human Diagnosis Project in Washington, D.C. – asked 234 physicians to read through a selection of 45 “clinical vignettes” to see how they would handle these hypothetical patients. Each vignette included the medical history of the “patient” but no results from a physical exam, blood test or other kind of lab work.
Most of the doctors were trained in internal medicine, though the group included some pediatricians and family practice physicians too. About half of them were in residency or fellowship, so their training was not yet complete.
Even so, of the 1,105 vignettes they considered, they listed the correct diagnosis first 72 percent of the time, according to the study.
The 23 symptom checkers evaluated a total of 770 vignettes in an earlier study by some of the same researchers. The apps and websites (including several from professional medical organizations, such as the American Academy of Physicians, the American Academy of Pediatrics and the Dutch College of General Practitioners) listed the correct diagnosis first just 34 percent of the time.
Both the doctors and the computer programs were able to include more than one ailment in their differential diagnosis. So the researchers also compared how often the correct diagnosis was among the top three responses.
For the doctors, that happened 84 percent of the time. For the symptom checkers, it was 51 percent of the time.
Though the humans trounced the computers across the board, there were situations in which did a particularly good job of naming the correct diagnosis first. For instance, their margin in cases with common conditions was 70 percent to 38 percent. In cases with uncommon conditions, it grew to 76 percent to 28 percent.
The seriousness of the malady made a difference too. In cases with low acuity, doctors bested software by 65 percent to 41 percent. But in cases with high acuity, that gap widened to 79 percent to 24 percent.
“Physicians vastly outperformed computer algorithms in diagnostic accuracy,” the researchers concluded. Full disclosure: Three of the study authors are doctors, and none are apps.