TY - GEN
T1 - MedDiTPro
T2 - 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
AU - Zhong, Yuan
AU - Wang, Xiaochen
AU - Wang, Jiaqi
AU - Zhang, Xiaokun
AU - Ma, Fenglong
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/8/3
Y1 - 2025/8/3
N2 - Diffusion models have recently emerged as a state-of-the-art approach for synthetic Electronic Health Record (EHR) generation, offering superior fidelity and diversity over traditional generative models. However, existing diffusion-based methods struggle with unique challenges: limited representation learning and modality utilization, where they fail to explicitly capture inter-modality dependencies and fine-grained code-level interactions, and constrained adaptability due to reliance on U-Net-based architectures, which are not well-suited for handling the heterogeneous and evolving nature of EHR data. Furthermore, current evaluation paradigms rely on either perplexity-based sequence modeling or global distributional measures, lacking robustness in assessing both intra-visit code relationships and inter-visit temporal patterns. To address these limitations, we propose MedDiTPro, a diffusion transformer-based framework that enhances multimodal EHR generation by integrating structured modality-aware guidance. Through a unified transformer for intra-visit representation learning, a modality-specific and datawise prompt learner, and a diffusion transformer with structured guidance, MedDiTPro achieves state-of-the-art performance in generating diverse and clinically meaningful synthetic records. Extensive experiments on publicly available datasets demonstrate that MedDiTPro achieves state-of-the-art fidelity, privacy preservation, and utility.
AB - Diffusion models have recently emerged as a state-of-the-art approach for synthetic Electronic Health Record (EHR) generation, offering superior fidelity and diversity over traditional generative models. However, existing diffusion-based methods struggle with unique challenges: limited representation learning and modality utilization, where they fail to explicitly capture inter-modality dependencies and fine-grained code-level interactions, and constrained adaptability due to reliance on U-Net-based architectures, which are not well-suited for handling the heterogeneous and evolving nature of EHR data. Furthermore, current evaluation paradigms rely on either perplexity-based sequence modeling or global distributional measures, lacking robustness in assessing both intra-visit code relationships and inter-visit temporal patterns. To address these limitations, we propose MedDiTPro, a diffusion transformer-based framework that enhances multimodal EHR generation by integrating structured modality-aware guidance. Through a unified transformer for intra-visit representation learning, a modality-specific and datawise prompt learner, and a diffusion transformer with structured guidance, MedDiTPro achieves state-of-the-art performance in generating diverse and clinically meaningful synthetic records. Extensive experiments on publicly available datasets demonstrate that MedDiTPro achieves state-of-the-art fidelity, privacy preservation, and utility.
UR - https://www.scopus.com/pages/publications/105014588557
UR - https://www.scopus.com/pages/publications/105014588557#tab=citedBy
U2 - 10.1145/3711896.3737045
DO - 10.1145/3711896.3737045
M3 - Conference contribution
AN - SCOPUS:105014588557
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 4086
EP - 4097
BT - KDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
Y2 - 3 August 2025 through 7 August 2025
ER -