MedSkim: Denoised Health Risk Prediction via Skimming Medical Claims Data

Suhan Cui, Junyu Luo, Muchao Ye, Jiaqi Wang, Ting Wang, Fenglong Ma

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Scopus citations

Abstract

Health risk prediction is a challenge task that aims to predict whether patients would suffer from a certain disease/condition in the near future based on their historical EHR data. Although existing approaches can achieve better performance, none of them can deal with the noise existing in the EHR data explicitly. In this paper, we hypothesize that automatically removing noise from EHR data should help the models further improve the performance. Correspondingly, we propose a novel model named MedSkim, which is able to automatically rule out irrelevant visits and codes by effectively skimming through the EHR data. In particular, the proposed model has a code selection module that can directly make a skipping decision to each individual diagnosis codes and then remove the target-irrelevant ones. A backward probing RNN (BPRNN) is designed to reversely process the EHR data and provide a coarse grained representation learning for visits. Besides, a forward skipping RNN (FSRNN) is proposed to read the EHR in a preceding way and dynamically select important visits and codes based on the results of previous two modules. Finally, the risk prediction module uses the output hidden states from FSRNN for generating the final representation to make predictions. Additionally, we also design an extra regularization term based on the skip rate of the model and combine it with standard cross entropy loss to train the model in an end-to-end setting. Experimental results show that MedSkim achieves the best performance on three real-world datasets compared with the state-of-the-art baselines in terms of PR-AUC, F1 and Cohen's Kappa. Moreover, the ablation study and case study confirm that the proposed MedSkim is reasonable and effective for removing noise from EHR data. The source code of the proposed MedSkim is available at https://github.com/SH-Src/MedSkim

Original languageEnglish (US)
Title of host publicationProceedings - 22nd IEEE International Conference on Data Mining, ICDM 2022
EditorsXingquan Zhu, Sanjay Ranka, My T. Thai, Takashi Washio, Xindong Wu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages81-90
Number of pages10
ISBN (Electronic)9781665450997
DOIs
StatePublished - 2022
Event22nd IEEE International Conference on Data Mining, ICDM 2022 - Orlando, United States
Duration: Nov 28 2022Dec 1 2022

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
Volume2022-November
ISSN (Print)1550-4786

Conference

Conference22nd IEEE International Conference on Data Mining, ICDM 2022
Country/TerritoryUnited States
CityOrlando
Period11/28/2212/1/22

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'MedSkim: Denoised Health Risk Prediction via Skimming Medical Claims Data'. Together they form a unique fingerprint.

Cite this