Many Models, Little Adoption—What Accounts for Low Uptake of Machine Learning Models for Atrial Fibrillation Prediction and Detection?

Yuki Kawamura, Alireza Vafaei Sadr, Vida Abedi, Ramin Zand

Research output: Contribution to journalReview articlepeer-review

2 Scopus citations


(1) Background: Atrial fibrillation (AF) is a major risk factor for stroke and is often underdiagnosed, despite being present in 13–26% of ischemic stroke patients. Recently, a significant number of machine learning (ML)-based models have been proposed for AF prediction and detection for primary and secondary stroke prevention. However, clinical translation of these technological innovations to close the AF care gap has been scant. Herein, we sought to systematically examine studies, employing ML models to predict incident AF in a population without prior AF or to detect paroxysmal AF in stroke cohorts to identify key reasons for the lack of translation into the clinical workflow. We conclude with a set of recommendations to improve the clinical translatability of ML-based models for AF. (2) Methods: MEDLINE, Embase, Web of Science,, and ICTRP databases were searched for relevant articles from the inception of the databases up to September 2022 to identify peer-reviewed articles in English that used ML methods to predict incident AF or detect AF after stroke and reported adequate performance metrics. The search yielded 2815 articles, of which 16 studies using ML models to predict incident AF and three studies focusing on ML models to detect AF post-stroke were included. (3) Conclusions: This study highlights that (1) many models utilized only a limited subset of variables available from patients’ health records; (2) only 37% of models were externally validated, and stratified analysis was often lacking; (3) 0% of models and 53% of datasets were explicitly made available, limiting reproducibility and transparency; and (4) data pre-processing did not include bias mitigation and sufficient details, leading to potential selection bias. Low generalizability, high false alarm rate, and lack of interpretability were identified as additional factors to be addressed before ML models can be widely deployed in the clinical care setting. Given these limitations, our recommendations to improve the uptake of ML models for better AF outcomes include improving generalizability, reducing potential systemic biases, and investing in external validation studies whilst developing a transparent modeling pipeline to ensure reproducibility.

Original languageEnglish (US)
Article number1313
JournalJournal of Clinical Medicine
Issue number5
StatePublished - Mar 2024

All Science Journal Classification (ASJC) codes

  • General Medicine

Cite this