Creating a data pipeline for safe and effective healthcare AI | Viewpoint

News
Article

For artificial intelligence to realize its full potential in healthcare, models must be trained on high quality data, but unstructured and siloed data are hampering progress.

Hospitals produce 50 petabytes of data per year, which is more than double the amount of data managed by the Library of Congress in 2022.

Image: emtelligent

Tim O’Connell

The ability of providers, payers, and researchers to effectively leverage that health data has the potential to improve patient and population outcomes, spur innovation and medical advances, increase operational efficiency, and drive down overall healthcare costs.

Unfortunately, leveraging health data can be extremely challenging because a significant proportion of that data is unstructured, in the form of PDFs, faxes, or prose-format clinical notes.

Even if health data is structured, much of it is trapped in siloes that make this data difficult to extract and share. That’s a huge obstacle because the data needed to fuel the most impactful discoveries and explore new frontiers in healthcare is buried in clinical notes.

The AI promise

Artificial intelligence (AI) models carry the promise of improving workforce performance and administrative efficiency in provider organizations. This can help ease clinician burnout caused by understaffing.

AI can complete in seconds or minutes tasks (such as manual chart review) that may take hours for provider staff. Further, AI’s ability to identify care gaps and optimize claims coding can enable providers to generate new revenue.

Additionally, AI can unlock the value of other data – such as social determinants of health (SDoH) – that can be used to inform clinical decisions and research. SDoH data, which includes economic stability, education level, access to transportation, healthcare access and quality, neighborhood environment and safety, and social interaction and support, is crucial to developing a whole-person approach to care and to shaping public health policy.

Beyond direct care applications, AI models have many use cases throughout medicine, not just in direct clinical care. Payers can use these models to accelerate and grow risk adjustment performance, improve underwriting and pricing accuracy, and spot care gaps and conditions for referral to clinical programs. Pharmaceutical companies can deploy AI to accelerate the process of finding the most appropriate clinical trial participants, compile and analyze post-launch real-world evidence (RWE) data, and power new drug and therapeutic discovery.

Breaking down the data bottleneck

For AI to realize its full potential in healthcare, models must be trained on high quality, accessible data. Unfortunately, unstructured and siloed data are disrupting what should be a smooth-running data pipeline, severely compromising data quality and undermining AI’s potential in healthcare.

Complicating matters, extracting data from silos and in unstructured form is only a first step. That data also must be organized, separated, coded, and summarized before it becomes usable. Existing technologies often address only one step in this complex process. And, narrow AI applications, while effective in performing specific tasks, lack the breadth to handle the entire data journey.

Conversely, large language models (LLMs) can perform a broad range of tasks but have limited performance on real-world medical data, which limits the utility of these models alone in clinical settings. Most ominously, LLMs are prone to hallucinations where they ‘make up’ facts – a dealbreaker in healthcare if there ever was one.

Healthcare organizations cannot afford to rely on fragmented solutions or unproven technologies. What they need is a medically aligned, end-to-end data pipeline – a single, unified platform designed to manage the entire data lifecycle. Such a pipeline enables seamless integration, transforming raw, unstructured, or siloed data into actionable insights.

By addressing the entire journey – from extraction and organization (including the splitting of PDFs and the digitization and interpretation of handwritten notes and documents) to summarization, contextualization, and actualization – this pipeline ensures that organizations can maximize the clinical, operational, and financial value of their data. Without it, AI’s potential remains untapped.

Designing a safe and effective data pipeline

While the promise of AI in healthcare is immense, its adoption has been slowed by legitimate safety and ethical concerns. The American Medical Association (AMA) and other organizations have proposed guidelines for ensuring that AI is deployed in a manner that is ethical, equitable, and transparent. However, the responsibility for implementing these principles ultimately lies with healthcare organizations. To ensure safe and effective AI, organizations must:

  • Develop policies aligned with the current standard of care, ensuring that AI applications are appropriate for clinical use in accordance with the current standard of care.
  • Engage clinical experts to assess the quality and relevance of AI tools from a multi-disciplinary perspective, and understand how clinicians at different levels of training use and are affected by these tools.
  • Understand and work to eliminate bias, which can perpetuate health inequities by favoring certain populations over others.
  • Continuously monitor AI performance to detect and address emerging biases or degrading accuracy in dynamic data environments.

Equally important is maintaining human oversight throughout the AI process. Clinicians must remain actively involved, using their expertise to validate AI-generated insights and make critical decisions. This approach aligns with principle #7 of the Good Machine Learning Principles for Medical Devices, a framework developed by regulatory agencies including the FDA and Health Canada. Keeping a "human in the loop" ensures that AI supports, rather than replaces, the nuanced judgment and experience of healthcare professionals.

By incorporating these ethical imperatives into the implementation of a medically aligned data pipeline, healthcare organizations can build a strong foundation for AI adoption. This approach not only safeguards patient trust but also ensures that AI fulfills its potential to transform healthcare in a safe, equitable, and impactful way.

Conclusion

AI can be the catalyst and enabler of healthcare transformation and whole-person care and therapies. Yet AI can’t meet these ambitious expectations if it is limited in the amount of data it can access, extract, and organize. A comprehensive end-to-end pipeline that is purpose-built for health data can provide the safe and ethical solution needed to transform healthcare.

Tim O’Connell is CEO and co-founder of emtelligent and a practicing radiologist.

Recent Videos
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Image: Chief Healthcare Executive
Image: Ron Southwick, Chief Healthcare Executive
Related Content
© 2025 MJH Life Sciences

All rights reserved.