TY - GEN
T1 - MedDiffusion
T2 - 2024 SIAM International Conference on Data Mining, SDM 2024
AU - Zhong, Yuan
AU - Cui, Suhan
AU - Wang, Jiaqi
AU - Wang, Xiaochen
AU - Yin, Ziyi
AU - Wang, Yaqing
AU - Xiao, Houping
AU - Huai, Mengdi
AU - Wang, Ting
AU - Ma, Fenglong
N1 - Publisher Copyright:
Copyright © 2024 by SIAM.
PY - 2024
Y1 - 2024
N2 - Health risk prediction aims to forecast the potential health risks that patients may face using their historical Electronic Health Records (EHR). Although several effective models have developed, data insufficiency is a key issue undermining their effectiveness. Various data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through learning underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability. The source code is available via https://shorturl.at/aerT0.
AB - Health risk prediction aims to forecast the potential health risks that patients may face using their historical Electronic Health Records (EHR). Although several effective models have developed, data insufficiency is a key issue undermining their effectiveness. Various data generation and augmentation methods have been introduced to mitigate this issue by expanding the size of the training data set through learning underlying data distributions. However, the performance of these methods is often limited due to their task-unrelated design. To address these shortcomings, this paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. Furthermore, MedDiffusion discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data. Experimental evaluation on four real-world medical datasets demonstrates that MedDiffusion outperforms 14 cutting-edge baselines in terms of PR-AUC, F1, and Cohen's Kappa. We also conduct ablation studies and benchmark our model against GAN-based alternatives to further validate the rationality and adaptability of our model design. Additionally, we analyze generated data to offer fresh insights into the model's interpretability. The source code is available via https://shorturl.at/aerT0.
UR - http://www.scopus.com/inward/record.url?scp=85185376153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85185376153&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85185376153
T3 - Proceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
SP - 499
EP - 507
BT - Proceedings of the 2024 SIAM International Conference on Data Mining, SDM 2024
A2 - Shekhar, Shashi
A2 - Papalexakis, Vagelis
A2 - Gao, Jing
A2 - Jiang, Zhe
A2 - Riondato, Matteo
PB - Society for Industrial and Applied Mathematics Publications
Y2 - 18 April 2024 through 20 April 2024
ER -