TY - GEN
T1 - MedRetriever
T2 - 30th ACM International Conference on Information and Knowledge Management, CIKM 2021
AU - Ye, Muchao
AU - Cui, Suhan
AU - Wang, Yaqing
AU - Luo, Junyu
AU - Xiao, Cao
AU - Ma, Fenglong
N1 - Publisher Copyright:
© 2021 ACM.
PY - 2021/10/26
Y1 - 2021/10/26
N2 - The broad adoption of electronic health record (EHR) systems and the advances of deep learning technology have motivated the development of health risk prediction models, which mainly depend on the expressiveness and temporal modeling capacity of deep neural networks (DNNs) to improve prediction performance. Some further augment the prediction by using external knowledge, however, a great deal of EHR information inevitably loses during the knowledge mapping. In addition, prediction made by existing models usually lacks reliable interpretation, which undermines their reliability in guiding clinical decision-making. To solve these challenges, we propose MedRetriever, an effective and flexible framework that leverages unstructured medical text collected from authoritative websites to augment health risk prediction as well as to provide understandable interpretation. Besides, MedRetriever explicitly takes the target disease documents into consideration, which provide key guidance for the model to learn in a target-driven direction, i.e., from the target disease to the input EHR. To specify, MedRetriever can flexibly choose its backbone from major predictive models to learn the EHR embedding for each visit. After that, the EHR embedding and features of target disease documents are aggregated into a query by self-attention to retrieve highly relevant text segments from the medical text pool, which is stored in the dynamically updated text memory. Finally, the comprehensive EHR embedding and the text memory are used for prediction and interpretation. We evaluate MedRetriever against nine state-of-the-art approaches across three real-world EHR datasets, which consistently achieves the best performance in AUC and recall metrics and outperforms the best baseline by at least 4.8% in recall on three test datasets. Furthermore, we conduct case studies to show the easy-to-understand interpretation by MedRetriever.
AB - The broad adoption of electronic health record (EHR) systems and the advances of deep learning technology have motivated the development of health risk prediction models, which mainly depend on the expressiveness and temporal modeling capacity of deep neural networks (DNNs) to improve prediction performance. Some further augment the prediction by using external knowledge, however, a great deal of EHR information inevitably loses during the knowledge mapping. In addition, prediction made by existing models usually lacks reliable interpretation, which undermines their reliability in guiding clinical decision-making. To solve these challenges, we propose MedRetriever, an effective and flexible framework that leverages unstructured medical text collected from authoritative websites to augment health risk prediction as well as to provide understandable interpretation. Besides, MedRetriever explicitly takes the target disease documents into consideration, which provide key guidance for the model to learn in a target-driven direction, i.e., from the target disease to the input EHR. To specify, MedRetriever can flexibly choose its backbone from major predictive models to learn the EHR embedding for each visit. After that, the EHR embedding and features of target disease documents are aggregated into a query by self-attention to retrieve highly relevant text segments from the medical text pool, which is stored in the dynamically updated text memory. Finally, the comprehensive EHR embedding and the text memory are used for prediction and interpretation. We evaluate MedRetriever against nine state-of-the-art approaches across three real-world EHR datasets, which consistently achieves the best performance in AUC and recall metrics and outperforms the best baseline by at least 4.8% in recall on three test datasets. Furthermore, we conduct case studies to show the easy-to-understand interpretation by MedRetriever.
UR - http://www.scopus.com/inward/record.url?scp=85119201062&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85119201062&partnerID=8YFLogxK
U2 - 10.1145/3459637.3482273
DO - 10.1145/3459637.3482273
M3 - Conference contribution
AN - SCOPUS:85119201062
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 2414
EP - 2423
BT - CIKM 2021 - Proceedings of the 30th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
Y2 - 1 November 2021 through 5 November 2021
ER -