TY - GEN
T1 - Towards Federated COVID-19 Vaccine Side Effect Prediction
AU - Wang, Jiaqi
AU - Qian, Cheng
AU - Cui, Suhan
AU - Glass, Lucas
AU - Ma, Fenglong
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - We propose FedCovid, a new federated learning system based on electronic health records (EHR), to predict COVID-19 vaccination side effects. Federated learning allows diverse data owners to work together to train machine learning models without sharing data, ensuring the privacy of EHR data. However, because EHR data is unique, directly using existing federated learning models may fail. The EHR data is diverse, with numerical and categorical characteristics as well as consecutive visits. Furthermore, each client’s data size is unequal, and the data labels are skewed due to the small number of patients that experience serious side effects. We present an adaptive approach to fuse heterogeneous EHR data and apply data augmentation techniques working with a margin loss to overcome the data imbalance issue in the client model training to address both challenges simultaneously in FedCovid. We recommend that when the server is updated, the data size of each client be taken into account to lessen the impact of clients with small data volumes. Finally, in order to train a stable and successful federated learning model, we suggest a new ordinal training technique. Experiments on a real-world dataset reveal that the suggested model is effective at predicting COVID-19 vaccination adverse effects. The performance increases by 14.35%, 17.81%, and 129.36% on the F1 score, Cohen’s Kappa, and PR-AUC, respectively, compared with the best baseline (The source code of the proposed FedCovid is available at https://github.com/JackqqWang/FedCovid.git ).
AB - We propose FedCovid, a new federated learning system based on electronic health records (EHR), to predict COVID-19 vaccination side effects. Federated learning allows diverse data owners to work together to train machine learning models without sharing data, ensuring the privacy of EHR data. However, because EHR data is unique, directly using existing federated learning models may fail. The EHR data is diverse, with numerical and categorical characteristics as well as consecutive visits. Furthermore, each client’s data size is unequal, and the data labels are skewed due to the small number of patients that experience serious side effects. We present an adaptive approach to fuse heterogeneous EHR data and apply data augmentation techniques working with a margin loss to overcome the data imbalance issue in the client model training to address both challenges simultaneously in FedCovid. We recommend that when the server is updated, the data size of each client be taken into account to lessen the impact of clients with small data volumes. Finally, in order to train a stable and successful federated learning model, we suggest a new ordinal training technique. Experiments on a real-world dataset reveal that the suggested model is effective at predicting COVID-19 vaccination adverse effects. The performance increases by 14.35%, 17.81%, and 129.36% on the F1 score, Cohen’s Kappa, and PR-AUC, respectively, compared with the best baseline (The source code of the proposed FedCovid is available at https://github.com/JackqqWang/FedCovid.git ).
UR - http://www.scopus.com/inward/record.url?scp=85150977468&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150977468&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-26422-1_27
DO - 10.1007/978-3-031-26422-1_27
M3 - Conference contribution
AN - SCOPUS:85150977468
SN - 9783031264214
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 437
EP - 452
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2022, Proceedings
A2 - Amini, Massih-Reza
A2 - Canu, Stéphane
A2 - Fischer, Asja
A2 - Guns, Tias
A2 - Kralj Novak, Petra
A2 - Tsoumakas, Grigorios
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022
Y2 - 19 September 2022 through 23 September 2022
ER -