TY - GEN
T1 - Multimodal ensemble approach to incorporate various types of clinical notes for predicting readmission
AU - Shin, Bonggun
AU - Hogan, Julien
AU - Adams, Andrew B.
AU - Lynch, Raymond J.
AU - Patzer, Rachel E.
AU - Choi, Jinho D.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - Electronic Health Records (EHRs) have been heavily used to predict various downstream clinical tasks such as readmission or mortality. One of the modalities in EHRs, clinical notes, has not been fully explored for these tasks due to its unstructured and inexplicable nature. Although recent advances in deep learning (DL) enables models to extract interpretable features from unstructured data, they often require a large amount of training data. However, many tasks in medical domains inherently consist of small sample data with lengthy documents; for a kidney transplant as an example, data from only a few thousand of patients are available and each patient's document consists of a couple of millions of words in major hospitals. Thus, complex DL methods cannot be applied to these kind of domains. In this paper, we present a comprehensive ensemble model using vector space modeling and topic modeling. Our proposed model is evaluated on the readmission task of kidney transplant patients, and improves 0.0211 in terms of c-statistics from the previous state-of-The-Art approach using structured data, while typical DL methods fails to beat this approach. The proposed architecture provides the interpretable score for each feature from both modalities, structured and unstructured data, which is shown to be meaningful through a physician's evaluation.
AB - Electronic Health Records (EHRs) have been heavily used to predict various downstream clinical tasks such as readmission or mortality. One of the modalities in EHRs, clinical notes, has not been fully explored for these tasks due to its unstructured and inexplicable nature. Although recent advances in deep learning (DL) enables models to extract interpretable features from unstructured data, they often require a large amount of training data. However, many tasks in medical domains inherently consist of small sample data with lengthy documents; for a kidney transplant as an example, data from only a few thousand of patients are available and each patient's document consists of a couple of millions of words in major hospitals. Thus, complex DL methods cannot be applied to these kind of domains. In this paper, we present a comprehensive ensemble model using vector space modeling and topic modeling. Our proposed model is evaluated on the readmission task of kidney transplant patients, and improves 0.0211 in terms of c-statistics from the previous state-of-The-Art approach using structured data, while typical DL methods fails to beat this approach. The proposed architecture provides the interpretable score for each feature from both modalities, structured and unstructured data, which is shown to be meaningful through a physician's evaluation.
UR - http://www.scopus.com/inward/record.url?scp=85073002143&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85073002143&partnerID=8YFLogxK
U2 - 10.1109/BHI.2019.8834640
DO - 10.1109/BHI.2019.8834640
M3 - Conference contribution
AN - SCOPUS:85073002143
T3 - 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings
BT - 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019
Y2 - 19 May 2019 through 22 May 2019
ER -