Artificial intelligence, clinical decision support and mobile apps are changing healthcare.
To understand the evolving role of artificial intelligence in healthcare, it helps to step back to view earlier AI triumphs in related fields. In 1997, for instance, IBM surprised the world by demonstrating that its supercomputer Deep Blue could defeat the reigning world chess champion Garry Kasparov. The computer accomplished this feat because its programmers had fed it millions of chess moves, and with its advanced processing power, Deep Blue was able to analyze 200 million positions a second. But by 2018, AI had taken a major step forward: Google demonstrated that its AlphaZero software could beat the best human chess opponent — not with the brute strength of IBM Deep Blue but with machine learning. AlphaZero was not programmed with millions of moves but only taught the basic rules of the game. By playing countless games, the computer taught itself how to win.
There are similarities in the way AI has developed in healthcare. Most clinical decision support (CDS) systems on the market have been using old-school algorithms to help physicians reach diagnostic and treatment decisions. These tools are static encyclopedias that provide deep knowledge of a wide range of medical topics but offer only simple search capabilities. But several innovative thinkers and vendors are pushing past these limitations to create interactive programs that will take us into the future. These systems make use of machine learning approaches like neural networks, extreme gradient boosting, causal forest modeling and the like.
Several of the more sophisticated CDS tools take advantage of neural networks. These software constructs were designed to mimic the functionality of the human brain with its neurons, synapses, axons and dendrites. The human neural network is capable of taking in information from its surroundings, interpreting it and then responding with a series of “outputs,” or instructions. Similarly, AI-generated neural networks accept inputs, which they then process through several layers of artificial neurons or nodes. Eventually, these nodes generate an output signal that can be used to augment diagnostic and treatment decisions. The process is illustrated in Figures 1 A and B.
Neural networks have been used to improve the diagnosis of diabetic retinopathy and melanoma and to help identify patients at high risk of complications from sepsis, heart disease and other conditions. In the case of skin cancer, the algorithms are linked to a digital camera and are trained to distinguish between normal moles and malignant lesions by evaluating hundreds of thousands of images. As Figure 1 A illustrates, the network begins by examining the millions of pixels that make up an individual skin photo, searching for unique features. It does this by working through a series of layers that each contain nodes. In layer 1, the software may initially recognize differences in light and dark regions. In layer 2, it may detect differences in the edges of melanomas versus normal moles — cancers typically have irregular edges. And finally, the algorithm may recognize more complex features that are unique to skin lesions and normal moles. Moles are typically spherical or oval in shape.
As the network analyzes all these features, it assigns them weights based on the relative strength of association with previously diagnosed melanomas and non-melanomas and then makes a final determination for each image, the output stage. During early attempts, the software makes numerous mistakes and mislabels images. During the process of back propagation, the program recognizes its mistakes, forcing the algorithm to rethink its conclusion and changing the weighting of each signal pathway.
A) A neural network designed to distinguish melanoma from a normal mole scans tens of thousands of images to teach itself how to recognize small differences between normal and abnormal skin growths. (B) During the process of differentiating normal from abnormal tissue, a neural network makes many mistakes. Back propagation analyzes these mistakes to help the program readjust its algorithms and improve its accuracy. (Source: Cerrato, P., Halamka, J., The Transformative Power of Mobile Medicine.)
Andre Esteva, of the Department of Electrical Engineering at Stanford University, and his colleagues used such a neural network on a data set containing more than 129,000 skin images. The diagnostic skills of the software were compared to those of 21 board-certified dermatologists and found the machine-learning algorithms as effective as humans.
Not all machine-learning algorithms rely on neural networking. A colorectal cancer screening tool created by Medial EarlySign, for example, uses extreme gradient boosting to fuel its predictive engine. XGBoost requires the use of several complex mathematical calculations, a more advanced form of multiple additive regression trees, taking advantage of the distributed, multithreaded processing power now available in today’s computing environment. The algorithm itself forms the backbone of a commercially available tool called ColonFlag.
A large-scale study performed on data from Kaiser Permanente Northwest evaluated the colorectal cancer screening tool and found its screening algorithm more accurately identified patients at high risk of colorectal cancer, using readily available parameters, including a patient’s age, gender and complete blood count. The investigation analyzed more than 17,000 Kaiser Permanente patients, including 900 patients who already had colorectal cancer. The analysis generated a risk score for patients without the malignancy to gauge their likelihood of developing it. The researchers compared ColonFlag’s ability to predict the cancer to that derived from looking at low hemoglobin (Hgb) levels. (Hgb declines when colorectal cancer causes gastrointestinal bleeding.) ColonFlag was 34 percent better at identifying the cancer within a 180- to 360-day period, when compared to low Hgb in patients between 50 and 75 years of age. The algorithms were more sensitive for detecting tumors in the cecum and ascending colon, versus the transverse and sigmoid colon and rectum. Put another way: “The study confirms the efficacy of Medial EarlySign’s ColonFlag tool in identifying individuals with 10 times higher risk of harboring undiagnosed colorectal cancer (CRC) while still at curable stages. In many patients, ColonFlag was further able to identify risk for colorectal tumors up to 360 days earlier than its actual diagnosis using conventional practices.” (In the United States, ColonFlag has been FDA cleared under the name LGI Flag).
One of the weaknesses of the Kaiser Permanente study was its retrospective design. Looking back in time is never as reliable in detecting a true cause/effect relationship or establishing a direct impact on clinical outcomes, when compared to a prospective analysis. Unfortunately, this is one of the shortcomings of many recent studies that support the role of machine learning in healthcare. Medial Early Sign has been addressing this concern: Its ongoing prospective study includes more than 79,000 patients who had refused to have a traditional screening test and whose complete blood count and other markers allowed investigators to calculate their risk of colorectal cancer using the ColonFlag algorithm. Their analysis identified 688 patients with very high-risk scores, 254 of whom took the advice of their physicians and had a colonoscopy performed. Among those patients, 19 cancers were detected (7.5 percent).
Stroke is a major cause of death and disability in the United States and the fifth leading cause of death, affecting about 795,000 Americans each year. An application called Viz.AI Contact has received FDA clearance as a diagnostic assistant to help neurovascular specialists more accurately detect the presence of a cerebrovascular accident. As the FDA approval notice explains: “The Viz.AI Contact application is designed to analyze CT images of the brain and send a text notification to a neurovascular specialist if a suspected large vessel blockage has been identified. The algorithm will automatically notify the specialist during the same time the first-line provider is conducting a standard review of the images, potentially involving the specialist sooner than the usual standard of care in which patients wait for a radiologist to review CT images and notify a neurovascular specialist. The notification can be sent to a mobile device, such as a smart phone or tablet, but the specialist still needs to review the images on a clinical workstation…” A retrospective study has demonstrated that the algorithm is as effective as neuro-radiologists in detecting large vessel blockages in the brain. Researchers have found that Viz.AI can accurately detect imaging details that indicate the presence of a stroke in 95 percent of cases, with a sensitivity rating of 88 percent and a specificity of 90 percent. While the software can spot a stroke faster than a human neurologist, it is not intended to replace clinicians but only to assist them in arriving at a definitive diagnosis.
Like strokes, sepsis can prove devastating to patients who are not promptly diagnosed, affecting more than 700,000 Americans annually and costing over $20 billion a year. The Duke Institute for Health Innovation has been developing a program called Sepsis Watch to help detect the condition at an early stage of development. The deep learning-enhanced project incorporates a wide variety of parameters to build its predictive model, including vital signs, lab data and medical history. The algorithm has already been trained on 50,000 patient records and more than 32 million data points. If the system flags a patient with the early signs of sepsis, it informs the rapid response team at Duke University Hospital to take action.
The Duke initiative joins other innovators in the specialized area of medical informatics. There are several traditional risk scoring systems in place in U.S. hospitals to help detect severe sepsis, including the Sequential Organ Failure Assessment, the Systemic Inflammatory Response Syndrome criteria and the Modified Early Warning Score. But a machine-learning system called InSight has been shown to outperform these scoring systems in a randomized controlled clinical trial in medical/surgical intensive care units at the University of California, San Francisco Medical Center. The average length of stay for patients evaluated with the InSight software was about 20 percent lower than that observed in the control arm of the trial (10.3 v. 13.0 days). Equally impressive was the fact that in-hospital mortality was 8.96 percent in the machine-learning group versus 21.13 percent in those evaluated with the more traditional assessment tools.
The machine-learning program was incorporated into the hospital’s electronic health record (EHR) system, eliminating the need for clinicians to move outside the patient record to access a separate system, an obstacle that often impedes physicians and nurses. The vital signs and related lab results needed to conduct the assessment in this study were readily obtained from the EHR system. In this trial, it was APeX from Epic. Shimabukuro and colleagues also note that, “Patients in the experimental group additionally received antibiotics an average of 2.76 hours earlier than patients in the control group and had blood cultures drawn an average of 2.79 hours earlier than patients in the control group.”
Although numerous research projects have shown that AI-enhanced systems hold great promise and will likely usher in a new era in patient care, these programs have their shortcomings. Besides the fact that many rely on retrospective analysis of patient records, there is also concern about the quality of the data sets being used to train these algorithms. If the data feeding these algorithms are not truly representative of the patient populations they intend to serve, the positive results being published can be misleading.
In one case that stands out, a model based on neural networking attempted to demonstrate that it could help detect pneumonia based on the algorithm’s interpretation of chest X-rays. The project used pooled data from two large hospitals, but when it tried to replicate the findings using data from a third hospital system, it failed. “In further analyses, the researchers found evidence that this model exploited imperceptible (to humans) image features associated with the hospital system and department, to a greater extent than image features of pneumonia, and that hospital system and department were themselves predictors of pneumonia in the pooled training data set.”
Navigate the digital transformation with confidence. Register for our newsletter.
About the Authors
Paul Cerrato has more than 30 years of experience working in healthcare as a clinician, educator and medical editor. He has written extensively on clinical medicine, clinical decision support, electronic health records, protected health information security and practice management. He has served as editor of Information Week Healthcare, executive editor of Contemporary OB/GYN, senior editor of RN Magazine and contributing writer/editor for the Yale University School of Medicine, the American Academy of Pediatrics, Information Week, Medscape, Healthcare Finance News, IMedicalapps.com and Medpage Today. HIMSS has listed Mr. Cerrato as one of the most influential columnists in healthcare IT.
John D. Halamka, M.D., leads innovation for Beth Israel Lahey Health. Previously, he served for over 20 years as the chief information officer (CIO) at the Beth Israel Deaconess Healthcare System. He is chairman of the New England Healthcare Exchange Network (NEHEN) and a practicing emergency physician. He is also the International Healthcare Innovation professor at Harvard Medical School. As a Harvard professor, he has served the George W. Bush administration, the Obama administration and national governments throughout the world, planning their healthcare IT strategies. In his role at BIDMC, Dr. Halamka was responsible for all clinical, financial, administrative and academic information technology, serving 3,000 doctors, 12,000 employees, and 1 million patients.
Related
How AI Can Make Clinicians Better
New Deep Neural Network Model Can Predict Adverse Drug Reactions
Finding mHealth Apps that Doctors Can Trust