TY - JOUR
T1 - Boosting Deep Learning for Interpretable Brain MRI Lesion Detection through the Integration of Radiology Report Information
AU - Dai, Lisong
AU - Lei, Jiayu
AU - Ma, Fenglong
AU - Sun, Zheng
AU - Du, Haiyan
AU - Zhang, Houwang
AU - Jiang, Jingxuan
AU - Wei, Jianyong
AU - Wang, Dan
AU - Tan, Guang
AU - Song, Xinyu
AU - Zhu, Jinyu
AU - Zhao, Qianqian
AU - Ai, Songtao
AU - Shang, Ai
AU - Li, Zhaohui
AU - Zhang, Ya
AU - Li, Yuehua
N1 - Publisher Copyright:
© 2024, Radiological Society of North America Inc.. All rights reserved.
PY - 2024/11
Y1 - 2024/11
N2 - Purpose: To guide the attention of a deep learning (DL) model toward MRI characteristics of brain lesions by incorporating radiology report–derived textual features to achieve interpretable lesion detection. Materials and Methods: In this retrospective study, 35 282 brain MRI scans (January 2018 to June 2023) and corresponding radiology reports from center 1 were used for training, validation, and internal testing. A total of 2655 brain MRI scans (January 2022 to December 2022) from centers 2–5 were reserved for external testing. Textual features were extracted from radiology reports to guide a DL model (ReportGuidedNet) focusing on lesion characteristics. Another DL model (PlainNet) without textual features was developed for comparative analysis. Both models identified 15 conditions, including 14 diseases and normal brains. Performance of each model was assessed by calculating macro-averaged area under the receiver operating characteristic curve (ma-AUC) and micro-averaged AUC (mi-AUC). Attention maps, which visualized model attention, were assessed with a five-point Likert scale. Results: ReportGuidedNet outperformed PlainNet for all diagnoses on both internal (ma-AUC, 0.93 [95% CI: 0.91, 0.95] vs 0.85 [95% CI: 0.81, 0.88]; mi-AUC, 0.93 [95% CI: 0.90, 0.95] vs 0.89 [95% CI: 0.83, 0.92]) and external (ma-AUC, 0.91 [95% CI: 0.88, 0.93] vs 0.75 [95% CI: 0.72, 0.79]; mi-AUC, 0.90 [95% CI: 0.87, 0.92] vs 0.76 [95% CI: 0.72, 0.80]) testing sets. The performance difference between internal and external testing sets was smaller for ReportGuidedNet than for PlainNet (Δma-AUC, 0.03 vs 0.10; Δmi-AUC, 0.02 vs 0.13). The Likert scale score of ReportGuidedNet was higher than that of PlainNet (mean ± SD: 2.50 ± 1.09 vs 1.32 ± 1.20; P < .001). Conclusion: The integration of radiology report textual features improved the ability of the DL model to detect brain lesions, thereby enhancing interpretability and generalizability.
AB - Purpose: To guide the attention of a deep learning (DL) model toward MRI characteristics of brain lesions by incorporating radiology report–derived textual features to achieve interpretable lesion detection. Materials and Methods: In this retrospective study, 35 282 brain MRI scans (January 2018 to June 2023) and corresponding radiology reports from center 1 were used for training, validation, and internal testing. A total of 2655 brain MRI scans (January 2022 to December 2022) from centers 2–5 were reserved for external testing. Textual features were extracted from radiology reports to guide a DL model (ReportGuidedNet) focusing on lesion characteristics. Another DL model (PlainNet) without textual features was developed for comparative analysis. Both models identified 15 conditions, including 14 diseases and normal brains. Performance of each model was assessed by calculating macro-averaged area under the receiver operating characteristic curve (ma-AUC) and micro-averaged AUC (mi-AUC). Attention maps, which visualized model attention, were assessed with a five-point Likert scale. Results: ReportGuidedNet outperformed PlainNet for all diagnoses on both internal (ma-AUC, 0.93 [95% CI: 0.91, 0.95] vs 0.85 [95% CI: 0.81, 0.88]; mi-AUC, 0.93 [95% CI: 0.90, 0.95] vs 0.89 [95% CI: 0.83, 0.92]) and external (ma-AUC, 0.91 [95% CI: 0.88, 0.93] vs 0.75 [95% CI: 0.72, 0.79]; mi-AUC, 0.90 [95% CI: 0.87, 0.92] vs 0.76 [95% CI: 0.72, 0.80]) testing sets. The performance difference between internal and external testing sets was smaller for ReportGuidedNet than for PlainNet (Δma-AUC, 0.03 vs 0.10; Δmi-AUC, 0.02 vs 0.13). The Likert scale score of ReportGuidedNet was higher than that of PlainNet (mean ± SD: 2.50 ± 1.09 vs 1.32 ± 1.20; P < .001). Conclusion: The integration of radiology report textual features improved the ability of the DL model to detect brain lesions, thereby enhancing interpretability and generalizability.
UR - http://www.scopus.com/inward/record.url?scp=85211478919&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85211478919&partnerID=8YFLogxK
U2 - 10.1148/ryai.230520
DO - 10.1148/ryai.230520
M3 - Article
AN - SCOPUS:85211478919
SN - 2638-6100
VL - 6
JO - Radiology: Artificial Intelligence
JF - Radiology: Artificial Intelligence
IS - 6
M1 - e230520
ER -