Rare disease prediction by generating quality-assured electronic health records*

Fenglong Ma, Yaqing Wang, Jing Gao, Houping Xiao, Jing Zhou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

Predicting diseases for patients is an important and practical task in healthcare informatics. Existing disease prediction models focus on common diseases, i.e., there are enough available EHR data and prior medical knowledge for analyzing them. However, those models may not work for rare disease prediction as it is extremely hard to collect enough EHR data with such diseases. To tackle these issues, in this paper, we design a novel rare disease prediction system, which not only generates EHR data but also automatically selects high-quality generated data to further improve the predictive performance. Three components are designed in the system: data generation, data selection, and prediction. In particular, we propose MaskEHR to generate diverse EHR data based on the data from patients suffering from the given diseases. To remove noise information in the generated EHR data, we further design a reinforcement learning-based data selector, called RL-Selector, which can automatically choose the high-quality generated EHR data. Finally, the prediction component is used to identify patients who will potentially suffer the given diseases. These three components work together and enhance each other. Experiments on three real healthcare datasets show that the proposed system outperforms existing approaches on rare disease prediction task.

Original languageEnglish (US)
Title of host publicationProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020
EditorsCarlotta Demeniconi, Nitesh Chawla
PublisherSociety for Industrial and Applied Mathematics Publications
Pages514-522
Number of pages9
ISBN (Electronic)9781611976236
DOIs
StatePublished - 2020
Event2020 SIAM International Conference on Data Mining, SDM 2020 - Cincinnati, United States
Duration: May 7 2020May 9 2020

Publication series

NameProceedings of the 2020 SIAM International Conference on Data Mining, SDM 2020

Conference

Conference2020 SIAM International Conference on Data Mining, SDM 2020
Country/TerritoryUnited States
CityCincinnati
Period5/7/205/9/20

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Rare disease prediction by generating quality-assured electronic health records*'. Together they form a unique fingerprint.

Cite this