TY - JOUR
T1 - An explainable model of host genetic interactions linked to COVID-19 severity
AU - GEN-COVID Multicenter Study
AU - Onoja, Anthony
AU - Picchiotti, Nicola
AU - Fallerini, Chiara
AU - Baldassarri, Margherita
AU - Fava, Francesca
AU - Mari, Francesca
AU - Daga, Sergio
AU - Benetti, Elisa
AU - Bruttini, Mirella
AU - Palmieri, Maria
AU - Croci, Susanna
AU - Amitrano, Sara
AU - Meloni, Ilaria
AU - Frullanti, Elisa
AU - Doddato, Gabriella
AU - Lista, Mirjam
AU - Beligni, Giada
AU - Valentino, Floriana
AU - Zguro, Kristina
AU - Tita, Rossella
AU - Giliberti, Annarita
AU - Mencarelli, Maria Antonietta
AU - Rizzo, Caterina Lo
AU - Pinto, Anna Maria
AU - Ariani, Francesca
AU - Di Sarno, Laura
AU - Montagnani, Francesca
AU - Tumbarello, Mario
AU - Rancan, Ilaria
AU - Fabbiani, Massimiliano
AU - Rossetti, Barbara
AU - Bergantini, Laura
AU - D’Alessandro, Miriana
AU - Cameli, Paolo
AU - Bennett, David
AU - Anedda, Federico
AU - Marcantonio, Simona
AU - Scolletta, Sabino
AU - Franchi, Federico
AU - Mazzei, Maria Antonietta
AU - Guerrini, Susanna
AU - Conticini, Edoardo
AU - Cantarini, Luca
AU - Frediani, Bruno
AU - Tacconi, Danilo
AU - Raffaelli, Chiara Spertilli
AU - Feri, Marco
AU - Donati, Alice
AU - Scala, Raffaele
AU - Chiaromonte, Francesca
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - We employed a multifaceted computational strategy to identify the genetic factors contributing to increased risk of severe COVID-19 infection from a Whole Exome Sequencing (WES) dataset of a cohort of 2000 Italian patients. We coupled a stratified k-fold screening, to rank variants more associated with severity, with the training of multiple supervised classifiers, to predict severity based on screened features. Feature importance analysis from tree-based models allowed us to identify 16 variants with the highest support which, together with age and gender covariates, were found to be most predictive of COVID-19 severity. When tested on a follow-up cohort, our ensemble of models predicted severity with high accuracy (ACC = 81.88%; AUCROC = 96%; MCC = 61.55%). Our model recapitulated a vast literature of emerging molecular mechanisms and genetic factors linked to COVID-19 response and extends previous landmark Genome-Wide Association Studies (GWAS). It revealed a network of interplaying genetic signatures converging on established immune system and inflammatory processes linked to viral infection response. It also identified additional processes cross-talking with immune pathways, such as GPCR signaling, which might offer additional opportunities for therapeutic intervention and patient stratification. Publicly available PheWAS datasets revealed that several variants were significantly associated with phenotypic traits such as “Respiratory or thoracic disease”, supporting their link with COVID-19 severity outcome.
AB - We employed a multifaceted computational strategy to identify the genetic factors contributing to increased risk of severe COVID-19 infection from a Whole Exome Sequencing (WES) dataset of a cohort of 2000 Italian patients. We coupled a stratified k-fold screening, to rank variants more associated with severity, with the training of multiple supervised classifiers, to predict severity based on screened features. Feature importance analysis from tree-based models allowed us to identify 16 variants with the highest support which, together with age and gender covariates, were found to be most predictive of COVID-19 severity. When tested on a follow-up cohort, our ensemble of models predicted severity with high accuracy (ACC = 81.88%; AUCROC = 96%; MCC = 61.55%). Our model recapitulated a vast literature of emerging molecular mechanisms and genetic factors linked to COVID-19 response and extends previous landmark Genome-Wide Association Studies (GWAS). It revealed a network of interplaying genetic signatures converging on established immune system and inflammatory processes linked to viral infection response. It also identified additional processes cross-talking with immune pathways, such as GPCR signaling, which might offer additional opportunities for therapeutic intervention and patient stratification. Publicly available PheWAS datasets revealed that several variants were significantly associated with phenotypic traits such as “Respiratory or thoracic disease”, supporting their link with COVID-19 severity outcome.
UR - http://www.scopus.com/inward/record.url?scp=85140779245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140779245&partnerID=8YFLogxK
U2 - 10.1038/s42003-022-04073-6
DO - 10.1038/s42003-022-04073-6
M3 - Article
C2 - 36289370
AN - SCOPUS:85140779245
SN - 2399-3642
VL - 5
JO - Communications Biology
JF - Communications Biology
IS - 1
M1 - 1133
ER -