In the US, Healthcare Data Access Is a Scavenger Hunt

September 10, 2018

Article

As researchers use big data to improve health, they’re bumping up against the hard limits of a fragmented healthcare system.

big data challenges,healthcare data gathering,disparate data medicine,hca news

^{Image has been modified. Credit: monsitj - stock.adobe.com.}

In the era of big data, obtaining data sets that paint a comprehensive picture of the American healthcare landscape is a big — if not impossible — challenge.

That’s not to say researchers aren’t trying.

“In a study … recently completed, we pulled together over 150 different data sources, and our data still weren’t completely representative of the U.S.,” said Joseph Dieleman, Ph.D., an assistant professor at the University of Washington’s Institute for Health Metrics and Evaluation.

>> READ: The NIH Makes a Big Push for Big Data

His research focuses on healthcare data, the economics of healthcare and healthcare policy, all areas that rely on solid data. Yet in the U.S., it’s virtually impossible to get a robust, all-encompassing picture of Americans’ healthcare.

Trudy Krause, Dr.P.H., MBA, associate professor of management, policy and community health at UTHealth School of Public Health in Houston, said the way America’s healthcare system is constructed makes compiling data something like a scavenger hunt.

“Unlike countries with a national healthcare system, the U.S. healthcare system is fragmented by payer type,” she said.

There are the public programs and entities, like Medicare, Medicaid and the Department of Veterans Affairs (VA), and then there is a vast landscape of private insurers, selling plans on the open market or through employer-sponsored plans.

“Thus, there is no central data bank of claims data from the payers,” Krause said.

Scrambling for Health Data

Consequently, health data researchers are left with a series of calculations — literally and figuratively.

“There (are) data on Medicare beneficiaries, some data on Medicaid, private insurance and (much less data) on uninsured spending, although this is a smaller fraction of (the) health sector,” Dieleman told Healthcare Analytics News™. “Getting access to all these data is very difficult; analyzing them jointly in order to study the entire health sector is really challenging.”

Broadly speaking, data from the government programs are easier — if not easy — to obtain.

“Centers for Medicare & Medicaid Services (CMS) makes some data available to researchers through the Research Data Assistance Center,” Krause said, “but it is project limited, and there are fees.”

CMS also has the Qualified Entity Certification Program, which enables qualifying organizations to access data. However, the certification process takes time, and even with certification, researchers must pay for the data they use.

>> READ: Leo Celi and the ‘Holy Grail of Personalized Medicine’

Because Medicaid varies on a state-by-state basis, access to Medicaid data is likewise hit-and-miss, Krause said.

However, even with a complete set of public-sector healthcare data, researchers would miss information on most Americans. According to the Kaiser Family Foundation, 56 percent of U.S. citizens had private insurance in 2016, either through an employer-sponsored plan or a nongroup plan. Add in the 9 percent of patients who were uninsured, and government data accounts for just over a third of all patient information.

Furthermore, because programs like Medicare, Medicaid and VA initiatives are designed for specific populations, comprehensive government data would not give a representative sample of the public, experts said.

Given the limits of public healthcare data, researchers must either use complex weighting and equations to adjust the numbers or turn to the private sector for data. The problem is, data from commercial insurers or pharmacies are even harder to come by.

“Most commercial carriers try very hard to protect their proprietary information that reveals contracting terms for providers, and thus (insurers) limit charge and payment information when providing data,” Krause said.

Worries about privacy add another layer of concern, causing insurers to remove identifying information from the data.

The end result, Krause said, is that most commercial insurers don’t share such data, and the few who do charge a considerable amount.

Another option, Dieleman said, is to simply conduct surveys of patients. Those data, though, “(have) many of (their) own challenges related to smaller sample size and respondents’ own reporting biases.”

Big Data from Vertical Integration

One research agency that does enjoy broad access to commercial claims data is Kaiser Permanente Center for Health Research, in Portland, Oregon. The institution has ready access to claims data on more than 12 million commercially insured patients. It has this access because of one important reason: It’s affiliated with Kaiser Permanente, the vertically aligned healthcare behemoth.

>> READ: What Healthcare Can Learn from a Proud Data Parasite

“Kaiser Permanente as a whole has a diverse population base distributed across eight different regions — including Hawaii, the most diverse state in the country,” said Alan Bauck, MBA, director of research data and analytics at the Kaiser Permanente Center for Health Research. “Taken together, Kaiser Permanente’s 12.2 million members are highly representative of the diversity of the nation.”

The data set isn’t a panacea. Kaiser-Permanente’s service areas tend to be more urban than the country as a whole, which means its patient base skews more diverse. And even with 12.2 million patients, the data set isn’t always large enough to render statistically significant insights into specific categories of patients, such as those with certain rare diseases. Bauck also noted that sometimes healthcare studies require nonmedical data that don’t make it into health records.

“For example, to get a complete picture of a person’s total health, researchers may need to consider not only healthcare interactions and claims information — which Kaiser Permanente captures well — but also factors such as socioeconomic variables, personal behaviors and genomics,” Bauck said.

In those instances, the center sometimes uses data from the U.S. Census Bureau or patient-reported socioeconomic data.

Single-Payer System Leads to More Robust Data

Because of the size of Kaiser-Permanente’s healthcare network, the company’s research center has some of the same data availability advantages as foreign countries with single-payer or socialized healthcare. In fact, Bauck noted, Kaiser-Permanente has the world’s largest nongovernmental electronic health system.

Those rich data sets make it easier for researchers in other countries to track large cohorts of patients over time and to get more comprehensive snapshots of the overall population.

Still, Dieleman said, even single-payer systems don’t offer a perfect solution to healthcare data.

“The difference is that in countries where there is a single-payer system, those data are closer (although still not perfectly representative of health services),” he said. “For example, (the National Health Service) in the (United Kingdom) only includes about 80 percent of total healthcare spending.”

Other countries also tend to protect their data in a manner like that of U.S. commercial insurers, with barriers like lengthy applications and high costs. Still, researchers in those countries can at least know that breaking through the barrier of one provider — the government — will unlock a high percentage of healthcare data.

Efforts to Improve U.S. Data Availability

Back in the U.S., a number of efforts have been made over the years to improve data accessibility.

“There have been some attempts to provide a central repository, such as the All Payer Claims Databases (APCDs) that exist through regulation in some states (but not all), and most APCDs limit data use for researchers and charge for that use,” Krause said. “Some private entities have also attempted to aggregate commercial claims data from select payers, but data use to researchers is limited and costly.”

>> READ: Can Google’s Cloud API Solve Healthcare’s Disparate Data Problem?

Earlier this year, CMS Administrator Seema Verma, MPH, announced that the government would begin making certain healthcare data more readily available to researchers. That effort will begin with Medicare Advantage data. Next year, the agency will make Medicaid and Children’s Health Insurance Program data available, a move that could solve the problem of inconsistency in state data release policies.

Data Scientists’ Healthcare Wish Lists

In the meantime, experts who specialize in analyzing healthcare data keep scraping together as much data they can find, even as they add more items to their wish lists.

In Texas, Krause’s center has pulled together data from Medicare and Medicaid and most of the state’s commercial insurers, including Blue Cross Blue Shield.

“We believe that this allows us to provide an accurate analysis of the insured persons in the state,” Krause said.

The caveat is that because her team doesn’t have data from the VA, from Tricare, a military insurance program, or on the uninsured population, it can’t include those populations in its findings.

Krause said she would like to see Texas create an APCD requirement to help fill the gaps. That, at least, would create a comprehensive state-level data set.

“If we did, UTHealth could lead the way and apply our policies for data access to qualified researchers and also be able to use it to inform policymakers in our great state,” she said.

At the top of Dieleman’s wish list are more robust data.

“My dream data set is nationally representative claims data that (are) linked to information about an individual’s behaviors (smoking, exercise, diet, health seeking behavior, etc.) as well as other information about income, education, race and geography,” he said.

Meanwhile, Bauck, whose perch at Kaiser Permanente gives him access to some of the most robust commercial claims data of any research center, said even with great data, the goal of improving healthcare is possible only if the techniques used to analyze and reap insights from the data continue to improve.

“For us, the challenge is not necessarily acquiring new data but continually enhancing the way we utilize available data to find new answers and solutions that will ultimately improve health outcomes and healthcare delivery at Kaiser Permanente and beyond,” he said.

Get the best insights in healthcare analytics directly to your inbox.

Questions Surround 23andMe’s Decision to Cut Off Developers from Its API

The Barriers to True Healthcare AI