There's more patient data out there than ever, but big data can create big problems.
A new column in JAMA poses an interesting question: How can healthcare stratify populations in an industry increasingly obsessed with precision medicine and big data?
Arjun K. Manrai, PhD and Chirag J. Patel, PhD, both of Harvard Medical School, co-authored the piece with Stanford University’s John P. A. Ioannidis, MD, DS. They are mostly concerned with the concept of “normality” as it pertains to clinical lab tests.
The trios write that even existing reference standards might need to be reevaluated as new information becomes available. The specificity of data gathered allows reference sets to be made from fewer and fewer unique individuals—the Clinical and Laboratory Standards Institute recommends that 120 individuals should be used to establish reference, but in practice that doesn’t always happen.
>>READ: Leo Celi and the 'Holy Grail of Personalized Medicine'
Multiplicity is another confounding concept. When a factor like blood HbA1c is examined across numerous subpopulations, “many differences are likely to be detected if there is not a correction for the number of comparisons performed.”
They give an example of how that can be problematic even with 120 reference cases. Classifying subjects by 5 races, 10 age groups, 2 genders, and 3 socioeconomic sets would produce 300 population strata. “Assuming there are no differences in the true analyte distributions...if 120 individuals are repeatedly (eg, from the same subpopulation) the phantom appearance of statistically significant differences will almost always be produced (many of which might seem to also have clinical relevance) even when none exist,” they write. That problem becomes even more pronounced with fewer than 120 reference individuals.
But the same massive datasets that create these headaches might also contain their cure. Longitudinal outcomes data could be linked at the individual level to test differences in reference intervals and large databases enable cross-set analyses that accounts for multiple testing. Manrai, Patel, and Ioannidis also write that tailored definitions of “normal” could be made based on more patient attributes and delivered to doctors at the point of care.
Lastly and “perhaps most emblematic of the precision medicine movement,” they suggest that genetic ancestry allows for more specific stratification of patient attributes than a doctor’s assessment that a patient falls into 1 of a handful of races, which could help bypass “problematic conflation of race and ancestry ubiquitous in health care data.”
Related Coverage:
Big Data, Analytics Ready to Meet Health Care Challenges
ODH and Princeton University Announce Machine Learning Pop Health Collaboration