Although it is true that data streams are now quite voluminous and opportunities abound, there are plenty of caveats.
“Big data” has been described by various futurists and health industry pundits as the source of the answers to all problems in health care. The promise of vast amounts of data about patients, doctors, and hospitals that can be sorted and summarized numerous ways has led some stakeholders to conclude that the reasons for any problem can be determined and the solution predicted with great certainty. The reality: although it is true that the data streams are now quite voluminous and opportunities abound, there are plenty of caveats to the notion of elegant and accurate data sorting and analyzing, as well as the prediction of outcomes.
Big data describes any large data set that has the potential to be mined for information. It is high volume, has a wide variety, can be quickly generated and aggregated, and is often (regrettably) incompatible with different databases. Big data allows for faster identification of high-risk patients, more effective interventions, and closer monitoring. It has earned the label of “big” because it comes from so many more sources than in the past. Everything everyone does can now be stored capable of being stored in, and potentially recalled, from a computer, cell phone, tablet, etc. Every clinical outcome or lab, charge, cost, and provider identity can be both stored and combined with another database for any patient, disease, or doctor. Detailed reports can be prepared for any category and drilled down to any population subset within seconds rather than days.
Big data is data now coming from many diverse corners of the health care system: research from drug manufacturers, digitized patient records, clinical trial information, and claims databases from public payers such as Medicare and Medicaid. In addition, an individual patient’s clinical data now come from a variety of sources, as well: payers, hospitals, outpatient clinics, doctors’ offices, and the patient themselves. Electronic medical records (EMRs) have become a major source of data thanks to federal incentives. With EMRs, every lab, drug, intervention, order sheet, physician order, progress note, and (potential) clinical outcome is available for aggregation by population and identification for future trends.
Big Data Takeaways
Big Data Caveats
Examples of Contemporary Uses of Big Data
Big Data vs Big Evidence: What’s the Difference?
Doing research is just like cooking food your guests will enjoy and want more of: you need the right ingredients (data) and a method of preparing and combining them (research and statistics) to create the end product (evidence). Until all the ingredients come together properly, you do not have something worthy of presentation. According to the International Society for Pharmacoeconomics and Outcomes Research Task Force Report on Real World Data, “Evidence is generated according to a research plan and interpreted accordingly, whereas data is but one component of the research plan. Evidence is shaped, while data simply are raw materials and alone are noninformative.”1
Much has been written in recent years suggesting that medicine decision making should be evidence-based. Evidence-based medicine (EBM) has been defined as “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.”2 Although EBM requires data, it also requires a rigor for both collection and analysis of the data so that the conclusions are not due to bias, statistical sampling error, or errors in data analysis. Simply copying data to a spreadsheet and looking at the average values may be quick, but it may also be inaccurate. No matter how much data exist, researchers still need to ask the right questions to create a hypothesis, design a test, and use the data to determine whether their hypothesis is true.
Self-Reported Data
Big data includes data from patients or unverified sources: patient registries, social media, and government sites that allow users and providers to enter data directly. These data can be aggregated and sorted anonymously or, with patient permission, be tied directly to objective clinical data, charges, cost of care, their disease states, and the medications they are taking. They can be used to measure a patient’s quality of life; their experience with physicians, hospitals, or other providers; or even home monitoring. Using GPS-enabled devices and smartphone apps, it is possible to directly report heart rate, blood pressure, arrhythmias, medication use or refill information, and blood glucose levels.
Data Mining and Correlations
A trend seen among less-experienced database users is the improper use of data mining. Individuals may search and re-sort the database until they find something that looks significant, even if it seems illogical. There are 3 problems with this approach:
The biggest risk of error from these correlation conclusions is inferring cause and effect. When 2 things occur together (which is all that correlation confirms), the researcher has the chance to show bias by declaring which happened first, naming which is the cause and which is the effect.
Predictive Modelling and Confounders
Big data by itself has limited value. The usefulness lies in the ability of pharmacy stakeholders to determine the trends and relationships between data points for any single population member. There are always “hidden variables,” or confounders, that may not be seen in the data but could serve to be important predictors of the outcome. These confounders include other concurrent therapy, severity of an illness, standard of care, concurrent diseases, and a patient’s genetic makeup.
A field of science called “predictive analytics” is used to predict how a situation will play out in the future based on results from the past. A prediction can be used to treat an entire population similar to the one studied or can be used to tailor treatments for individual patients based on determining the provider, hospitals, or medications most likely to achieve a given outcome. In essence, big data serves to substitute the experience of thousands of similar patients who had a variety of outcomes for the clinical judgment of the treating physician.
Can Big Data Be Used to Tell Us Which Care Is Cost Effective?
To choose a cost-effective intervention (whether medications, clinical services, or devices), provider, or facility for a patient or group of patients, providers need to know the cost from the perspective of the user. Costs differ from the provider and payer perspectives, and a hospital's costs and its charges are not the same. We also need to know how effective an intervention is in achieving the primary clinical outcome, whatever that may be: a quicker cure, a longer life, a disability prevented, a successful surgery. Costs and efficacy may then be compared, and the following rules developed by the author of this article may be used to determine the most cost-effective treatment:
Potential for errors in cost-effectiveness research include:
Conclusion
Big data seems to offer some real potential to improve the quality of care and related outcomes by trying to determine which procedures and providers offer cost-effective treatment. Database users need to know the information they will be accessing is complete and accurate. Big data needs to provide enough detail so that when users need to “drill down” to a specific treatment, patient category, or provider, the data can be accessed and summarized. Providers will, assuming that all the relevant databases can be tied together, have more information from multiples places where care has been provided: pharmacists, labs, physicians, hospitals, nursing homes, emergency departments, and outpatient surgery centers.
Because privacy and security will be concerns since significant database breaches are reported weekly among large companies, it will be important to ensure data do not fall into the wrong hands. Consumers will always be concerned that some of this data may fall into the hands of an employer, insurance company, or even an ex-spouse and be used in a prejudicial manner. How to collect and sort this data while keeping it away from the “wrong” people and getting to the “right” people will be a challenge for years to come.
The potential reward of big data is tremendous, but it coexists with the possibility of serious problems resulting from its misuse. Unverified data from patients; databases that cannot communicate and share information; the risk of missing, incomplete, or inaccurate information; and a lack of rigor in research, including false conclusions of cause and effect based on incorrect association or correlation, are all issues that must be addressed.
Lorne Basskin, PharmD, is a consultant on outcomes research, formulary decision making and pharmacoeconomics, and teaches in the School of Public Health at Brown University.