Will AI Improve Cancer Diagnosis and Treatment?

September 9, 2019

By Paul Cerrato, M.A.
John Halamka, M.D., M.S.

Article

Stakeholders hope AI, machine learning will significantly decrease death toll.

In his Pulitzer-Prize winning book, Siddhartha Mukherjee, M.D., called cancer the “Emperor of All Maladies.” The term is spot on. The disease commands the attention of all who come in contact with it, whether they be patients, clinicians, or technologists, claiming the lives of more than 8 million worldwide and disrupting the lives of millions more. That’s not to suggest that we aren’t making any progress. The latest statistics from the American Cancer Society state that the U.S. cancer death rate has dropped by 27% in 25 years. But that still means about 1.7 million new cancer cases and over 606,000 deaths are expected this year alone. Many stakeholders are hoping that advances in artificial intelligence and machine learning will make a significant dent in these numbers.

Diagnostic Advances

In a previous article, we discussed the value of machine-learning algorithms to assist in the diagnosis of melanoma and to improve colorectal cancer screening. The software used to help detect skin cancer takes advantage of neural networks while the colorectal screening system, commercially available as ColonFlag in Europe and LGI Flag in the United States, uses a machine-learning tool called extreme gradient boosting. There have been similar advances in several other specialties, including breast and lung cancers.

To make sense of the role of ML in breast cancer risk assessment, it helps to understand how breast density fits into the picture. Women with denser breasts, which contain more fibrous and glandular tissue, are at greater risk of the malignancy than women with fatty breasts. Traditionally, there have been tewo risk models to help clinicians in this area: the Gail model and, more recently, the Tyrer-Cuzick (TC) model. The latter has been incorporated into the Gail model to take into consideration a woman’s breast density. The TC model is now considered the clinical standard. However, because assessing a patient’s breast density can be subjective, it may be possible to improve the assessment using deep learning to analyze mammograms for subtle differences not detectable by the naked eye. Adam Yala, a Ph.D. candidate from the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, and his colleagues have developed an algorithm that looks at full-field mammography images to assist clinicians in evaluating breast tissue density.

Yala and his colleagues used data from patients’ EHRs and questionnaires to evaluate their likelihood of developing breast cancer using 3 different models. One used traditional risk factors, including the TC model and logistic regression, the second used a convolutional neural network (CNN) to analyze mammogram images, and the third combined both approaches. The analysis found that the CNN was much better than the TC model for predicting breast cancer; combining CNN with TC was even better. One of the surprising findings was that women who did not have dense breasts but were considered at high risk based on this more sophisticated risk assessment model were almost four times as likely to have developed cancer, when compared to those with dense breasts and model-assessed low risk.

Machine learning is also helping pathologists as they try to interpret biopsy slides and determine whether or not a patient’s breast cancer has metastasized to nearby lymph nodes. Yun Liu, Ph.D., from Google AI Healthcare, and associates applied an algorithm called Lymph Node Assistant (LYNA) to two separate data sets of pathology slide images. The first set of slides was also reviewed by one of two pathologists while slides from the second data set were interpreted by two pathologists. Comparing LYNA to the skills of a practicing pathologist, investigators found an area under the curve (AUC) of 99.3% for nodal metastasis present or absent for the first data set. (AUC, also called receiver operating characteristic (ROC) is a metric that helps to distinguish between two diagnostic groups, in this case slides that indicate metastasis vs no metastasis). Pathologists returned an AUC of 96.6% when they were allowed to take as much time as they needed to arrive at a diagnosis. That AUC dropped to 81% when a real-world scenario was put in place: They were only allowed one minute per slide. Liu and associates concluded that: “Artificial intelligence algorithms can exhaustively evaluate every tissue patch on a slide, achieving higher tumor-level sensitivity than, and comparable slide level performance to, pathologists. These techniques may improve the pathologist’s productivity and reduce the number of false negatives associated with morphologic detection of tumor cells.” In a separate study that used LYNA, David Steiner and associates demonstrated that combining the algorithm with the services of a human pathologist resulted in more accurate detection of micrometastases than either the pathologists on their own or LYNA on its own.

One of the challenges facing pathologists is their lack of consistency in making a cancer diagnosis. ML—enhanced algorithms may eventually solve this problem by supplementing pathologists’ judgment. Angel Cruz-Roa and his colleagues addressed this challenge using a convolutional neural network to improve the diagnosis of invasive breast cancer in whole slide images. Invasive disease was defined as the spread of the tumor beyond the breast’s milk ducts or lobules. The researchers used a ConvNet classifier and validated it with images from three different institutions. They then evaluated it using pathologic and normal cases from the Cancer Genome Atlas and University Hospitals Case Medical Center in Cleveland. When compared to pathologists’ findings, the classifier generated a positive predictive value of 71.6% and a negative predictive value of 96.8%. One of the criticisms of ML—based diagnostic tools is that they are often trained on a single data set and may not be generalizable to the larger patient population. The research team addressed this concern by using 2 different data sets. Comparing the two resulted in highly correlated performance measures (r >/–0.8).

Technologists are hoping that AI-based diagnostic tools like this will help address the epidemic of diagnostic errors we face in clinical medicine. It’s estimated that a delayed diagnosis is one of the four common causes of diagnostic errors, which also include missed diagnosis, misdiagnosis, i.e. incorrectly diagnosed disease, and overdiagnosis. Unfortunately, technology can’t solve all these problems. A report from the Institute for Health Improvement and CRICO, the risk management foundation of the Harvard Medical Institutions, points out that there are over 100 million outpatient referrals to specialists annually, but up to half of these referrals are never completed. A single missed referral can easily lead to a delayed diagnosis and a malpractice claim. The report explains: “Of malpractice claims related to missed or delayed diagnosis in the ambulatory setting, almost half involve failure to follow up, many of which involve problems with specialist referrals.”

Luke Sato, M.D., chief medical officer at CRICO, explains that the diagnostic errors that lead to malpractice claims fall into two broad categories: cognitive errors and system-based errors. Although diagnostic errors may result from the failure of a physician to order a colonoscopy, for instance, or refer a patient to the appropriate specialist, CRICO has found that errors also result from failing to “close the loop,” for example, a fault in the healthcare system that somehow interferes with the referral process. How serious is this problem? Sato points out that although about 70% of primary care physicians say that they include the patient’s history and the reason for referring patients to a specialist “always” or “most of the time,” fewer than 35% of specialists say they receive this information. In some cases, the technology is unavailable within the EHR to allow clinical or administrative staff to properly close the loop. In other cases, it is available but not well utilized. The IHI/CRICO white paper discusses a long list of communication breakdowns responsible for the problem, but part of the problem is a lack of accountability. In the U.S. healthcare ecosystem, there is no one consistently held responsible for scheduling the process. Equally troubling is the fact that no one is consistently in charge of knowledge management.

Tackling Cervical Cancer with Deep Learning

While diagnostic errors remain a stubborn, expensive problem, there are many success stories worth telling. The prevention of cervical cancer in the United States is one of the genuine triumphs in oncology. In the 1940s, the disease was a major cause of death among women of childbearing age in the U.S., but since the Pap smear was introduced in the 1950s, invasive cervical cancer has dropped dramatically: From 1955 to 1992, its incidence and death rate dropped by more than 60%, Unfortunately, the disease continues to be a major cause of death and suffering in poorer countries that can’t afford routine Pap smear screening. Worldwide, there are about 500,000 cases, and 80% of them occur in low and middle income countries. A recent data analysis suggests that an automated visual evaluation of cervical images, developed with the help of a convolutional neural network, can have a significant impact on this worldwide problem. (Two video tutorials are available that provide a basic understanding of how neural networks work. One was created for an episode of PBS’s Nova show. The other is posted on the JAMA web site.)

Currently, clinicians in these poorer countries use an acetic acid test to estimate if a woman has precancer of the cervix. The test involves applying acetic acid to the cervix to see if the tissues turn white, which suggests precancer or cancer. While the test is inexpensive, its interpretation is subjective, and isn’t very accurate in differentiating between precancer and more common minor abnormalities. That in turn results in overtreatment and undertreatment. Liming Hu, Ph.D., with the Intellectual Ventures Global Good Fund, Bellevue, Washington, and his colleagues, analyzed a data set of cervical images from over 9,000 women in Costa Rica who had been screened for the malignancy and followed for seven years.

The automated visual evaluation algorithm was designed to accomplish two tasks: It located the cervix on the camera image, and it predicted the likelihood of the patient having cervical intraepithelial neoplasia (CIN2+), a group of precancerous lesions. Hu and his colleagues found that the machine-learning enabled algorithm was more accurate than manually performed visual inspection of the cervix and more accurate than Pap smears.

Outsmarting the Human Eye

Radiologists have two invaluable tools to help them detect diagnostic clues in the images they review: Their eyes and their clinical training. But the human eye obviously has its limitations. There are millions of pixels in a typical X-ray, making it virtually impossible to notice subtle abnormalities that can be easily detected by a machine-learning based algorithm. The computer’s advantage has been observed in numerous diagnostic settings, including lung cancer.

Although screening of high-risk patients with computed tomography (CT scans) has been shown to reduce the death rate in lung cancer patients, many cancers are still missed due to misjudgements in reading and interpreting the scans. Several software systems have been used to help address this problem, including those that rely on deep learning neural networks. Ross Gruetzemacher, a Ph.D. candidate with the Department of Systems & Technology, Auburn University in Alabama, and his colleagues, have created a two-tier system to analyze CT scans, looking for suspicious pulmonary nodules that suggest lung cancer. One neural network was used to locate nodule candidates worth investigating and the second network helped eliminate false positive readings of the CT scans. Gruetzemacher and his associates used a data set derived from National Cancer Institute’s Cancer Imaging Archive. They explain in their report: “Deep learning offers unique advantages over traditional methods for nodule detection. Rather than relying on mathematical techniques or hand-crafted features, deep learning enables learning internal feature representations directly from the input data.” Put another way, neural networks teach themselves how to recognize nodules by evaluating thousands of images. Their two-tiered system worked with a detection rate of 89.3%, with only 1.7 false positives per scan.

Machine learning will never replace an experienced clinician’s diagnostic skills, but as these research projects demonstrate, it is slowly emerging as a valuable complement that will likely save lives.

Paul Cerrato has more than 30 years of experience working in healthcare as a clinician, educator, and medical editor. He has written extensively on clinical medicine, electronic health records, protected health information security, practice management, and clinical decision support. He has served as editor of Information Week Healthcare, executive editor of Contemporary OB/GYN, senior editor of RN Magazine, and contributing writer/editor for the Yale University School of Medicine, the American Academy of Pediatrics, Information Week, Medscape, Healthcare Finance News, IMedicalapps.com, and Medpage Today. HIMSS has listed Mr. Cerrato as one of the most influential columnists in healthcare IT.John D. Halamka, M.D., leads innovation for Beth Israel Lahey Health. Previously, he served for over 20 years as the chief information officer (CIO) at the Beth Israel Deaconess Healthcare System. He is chairman of the New England Healthcare Exchange Network (NEHEN) and a practicing emergency physician. He is also the International Healthcare Innovation professor at Harvard Medical School. As a Harvard professor, he has served the George W. Bush administration, the Obama administration and national governments throughout the world, planning their healthcare IT strategies. In his role at BIDMC, Dr. Halamka was responsible for all clinical, financial, administrative and academic information technology, serving 3,000 doctors, 12,000 employees, and 1 million patients.

Get the best insights in digital health directly to your inbox.

Mental Health Apps: Do They Work? Are They Safe?

Tackling the Misdiagnosis Epidemic with Human and Artificial Intelligence