Scalable Bayesian Nonparametric Method for Clinical Risk Prediction Using Large-Scale Data From Heterogeneous Populations

Research output: Contribution to journalArticlepeer-review

Abstract

While analyzing large clinical datasets allows for the identification of complex patterns to achieve increased risk prediction accuracy, it also presents challenges for existing risk modeling techniques due to patient heterogeneity and the ever-evolving volume and distributions of data. Bayesian nonparametric methods, such as the Dirichlet Process Mixture Model (DPMM), offer a promising solution for modeling data with mixed and overlapping distributions. However, the approach is computationally prohibitive when applied to large datasets, which greatly limits practical applications. In this study, we propose a scalable framework for efficiently constructing DPMMs from large clinical datasets. To improve computational efficiency, we divide the full dataset into smaller subsets and learn DPMMs within individual sets. Additionally, we adopt a recentered pseudo-barycenter to approximate the posterior density of the full dataset and design a new algorithm to generate a consistent clustering rule from the subset posteriors with unequal numbers of components. The method was validated through a simulation study and a case study predicting the survival of heart failure patients post-left ventricular assist device implantation. The results demonstrated improved accuracy compared to benchmark models such as the Cox proportional hazards model and random survival forests. Our modeling framework adaptively clusters patients with distinct risk profiles into subgroups and predicts their probabilities of developing adverse events from overlapping posterior mixtures, providing an effective approach for addressing patient heterogeneity and enhancing risk prediction accuracy.

Original languageEnglish (US)
Pages (from-to)7514-7524
Number of pages11
JournalIEEE Journal of Biomedical and Health Informatics
Volume29
Issue number10
DOIs
StatePublished - 2025

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Health Informatics
  • Electrical and Electronic Engineering
  • Health Information Management

Fingerprint

Dive into the research topics of 'Scalable Bayesian Nonparametric Method for Clinical Risk Prediction Using Large-Scale Data From Heterogeneous Populations'. Together they form a unique fingerprint.

Cite this