|Articles|March 8, 2021

Machine-Learning Algorithm Identifies Incident Stroke

The algorithm can be adopted by other hospitals and health systems to identify incident stroke.

A machine-learning algorithm performs well for identifying incident stroke and for determining type of stroke.

The algorithm’s performance in a general population sample demonstrated its generalizability and potential to be adopted by other hospitals and health systems.

Nicholas Larson, Ph.D., and colleagues developed a machine learning-based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. The predictive modeling study used observational cohort data for training and validation. An atrial fibrillation (A-fib) cohort was used to train and test the phenotyping algorithm for the date of incident stroke events. The generalizability of the algorithm was evaluated in a general population cohort.

A patient population from Minnesota made up the A-fib cohort. All healthcare-related events were extracted through the Rochester Epidemiology Project. Data included demographic information, diagnostic and procedure codes, healthcare utilization data, outpatient drug prescriptions, results of laboratory tests, and information about smoking, height, weight, and body mass index.

The algorithm aimed to identify first stroke events within a certain time frame. The team used three major data elements: clinical concepts, ICD-9 codes, and CPT codes. Different models were constructed by varying the inclusion of CPT codes and symptom-related clinical concepts in the model feature set and compared different models’ performances. Clinical concepts were identified from the major and secondary problem list in the Mayo Clinic EHR and from clinical notes from other Rochester Epidemiology Project sites using a natural language processing system.

Larson and the investigators created a data set with 9,130 confirmed visits with stroke and nonstroke labels among 1,773 patients. There were 746 stroke visits and 8,384 nonstroke visits. They included data from a randomly selected 79.98% of screened patients as a training set and the other 20.02% were retained as an independent testing set.

Phenotype models were trained using logistic regression and random forest. The team evaluated the generalizability of the model on a sample from a general population cohort of more than 71,000 patients. Those included were at least 30 years old with no prior history of cardiovascular disease. The best performing model was applied to the entire population cohort to generate incident stroke predictions. Then, 50 patients were randomly selected from those who had no stroke-related features, 50 patients were selected from those who were shown to have negative stroke predictions, and 50 patients were selected from those who were shown to have positive stroke predictions and a predicted incident stroke for evaluation.

Overall, of 4,914 patients with A-fib, 740 had validated incident stroke events. The best performing algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier.

Among those with stroke codes in the general population sample, the best-performing model had a positive predictive value of 86% (95% CI, .74-.93) and a negative predictive value of 96%. For subtype identification, the team achieved an accuracy of 83% in the A-fib cohort and 80% in the general population sample.

The findings demonstrated incorporating structured EHR data can effectively distinguish incident stroke mentions from historical events in the clinical notes. Based on the performance of the AI among the general population cohort, the algorithm could be adopted by other institutions.

The study, “Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation,” was published online in the Journal of Medical Internet Research.

Subscribe Now!

Latest CME

Multimedia

Mastering Epithelioid Sarcoma: Enhancing Diagnostic Precision and Tailoring Treatment Strategies

Mark Agulnik, MD; Mrinal M. Gounder, MD; Jacqueline M. Kraveka, DO; Daniel Lefler, MD; Shaina A. Rozell, MD, MPH; Lee M. Zuckerman, MD

Case-based Simulation

Clinical Showcase™: Selecting the Best Next Steps for a Patient with Epithelioid Sarcoma

Mark Agulnik, MD; Daniel Lefler, MD

In-Person Event

Brain Mets: Brain & Spine Metastases Research and Emerging Therapy Conference

January 22, 2026

In-Person Event

2nd Annual Hawaii Cancer Conference

January 24-25, 2026

Machine-Learning Algorithm Identifies Incident Stroke

Newsletter

Related Content

Nurses gain support in fight over professional degrees

ChristianaCare, Virtua drop plans to create $6B health system

Ryan Shazier’s NFL career ended with a spinal cord injury. Now he helps patients in need.

Strengthening the CFO/CISO partnership for cybersecurity | Viewpoint

Healthcare leaders fear possible changes to vaccine schedule

Latest CME

Mastering Epithelioid Sarcoma: Enhancing Diagnostic Precision and Tailoring Treatment Strategies

Clinical Showcase™: Selecting the Best Next Steps for a Patient with Epithelioid Sarcoma

Brain Mets: Brain & Spine Metastases Research and Emerging Therapy Conference

2nd Annual Hawaii Cancer Conference

Medical Crossfire®: Bridging Evidence to Practice in AML…Updates on FLT3, IDH1/2, Maintenance, Combos, and Clinical Trials

A Breath of Strength: Managing Cancer Associated LEMS and Lung Cancer as One

Show Me the Data™: Bridging Clinical Gaps Along the Continuum From Resectable, Early Stage to Advanced Gastric/Gastroesophageal Junction Cancers

Striking the Right Nerve: Managing Cancer Associated LEMS in Lung Cancer Patients

19th Annual New York GU Cancers Congress™

Medical Crossfire®: Expert Interpretations of the Latest Data in CLL Management – Understanding the Impact of Optimal Treatment Selection on Patient Outcomes

Virtual Testing Board: Digging Deeper on Your Testing Reports to Elevate Patient Outcomes in Advanced Non–Small Cell Lung Cancer

11th Annual School of Gastrointestinal Oncology® (SOGO®)

Addressing Unmet Needs in HER2+ Metastatic BTC

Community Practice Connections™: Tailored Treatment Approaches for Older Patients With Advanced HR+/HER2– Breast Cancer

Community Practice Connections™: Optimizing Treatment Outcomes and Preserving Fertility in Premenopausal HR+ Breast Cancer

From Bench to Bedside: Paradigm Shifts in HER2+ Metastatic BTC Treatment

Proactive Adverse Event Management for HER2+ BTC Treatments

Community Practice Connections™: Empowering Interventional Radiologists in the Emerging Era of Oncolytic Immunotherapies for Melanoma

A Case-Guided Discussion on Managing Immune Thrombocytopenic Purpura (ITP)

GI Tumor Board—Applying Recent Advances in Biomarker Testing and Treatment in Metastatic Colorectal Cancer

Evolving Treatment Strategies in Pancreatic Cancer: Current Standards, Emerging Targets, and the Role of Molecular Testing

Medical Crossfire®: Precision Medicine in Glioma Treatment — Integration of Molecular Profiling to Inform Targeted Therapies

Cases and Conversations™: Sorting Through the Expanding Treatment Options for Patients with Relapsed/Refractory Multiple Myeloma

PER Tumor Board®: Applying Recent Advances to Transform the Treatment Paradigm in SCLC—Expert Perspectives on New Approvals and Emerging Strategies

Medical Crossfire®: Harnessing the Power of Modern Therapies in Newly Diagnosed Multiple Myeloma

Medical Crossfire®: Improving Patient Outcomes in Myeloproliferative Neoplasms With Novel Therapeutic Approaches

Tumor Board: Expert Insights on Managing Classical 𝘌𝘎𝘍𝘙 Mutations, 𝘌𝘎𝘍𝘙 Exon 20 Insertions, and Atypical 𝘌𝘎𝘍𝘙 Mutations in Metastatic NSCLC

Medical Crossfire®: Expert Perspectives on Targeting c-Met Overexpression and 𝘔𝘌𝘛 Genomic Alterations in NSCLC – Unveiling the Complexities of 𝘔𝘌𝘛 Dysregulation

Cases & Conversations™: Transforming AML Care—Precision Strategies, Evolving Therapies, and Clinical Insights

Medical Crossfire®: Integrating Next-Generation Endocrine Targeting Therapies to Improve Outcomes for Patients With HR+/HER2- Breast Cancer

Medical Crossfire® in Adjunctive Testing: Charting a New Course in Prostate Cancer Risk Assessment

Trending on Chief Healthcare Executive

ChristianaCare, Virtua drop plans to create $6B health system

Nurses gain support in fight over professional degrees

Takeaways from healthcare cyberattacks in 2025

Ryan Shazier’s NFL career ended with a spinal cord injury. Now he helps patients in need.

Strengthening the CFO/CISO partnership for cybersecurity | Viewpoint