TY - JOUR
T1 - Domain-adaptive neural networks improve cross-species prediction of transcription factor binding
AU - Cochran, Kelly
AU - Srivastava, Divyanshi
AU - Shrikumar, Avanti
AU - Balsubramani, Akshay
AU - Hardison, Ross C.
AU - Kundaje, Anshul
AU - Mahony, Shaun
N1 - Publisher Copyright:
© 2022 Cochran et al.
PY - 2022/3
Y1 - 2022/3
N2 - The intrinsic DNA sequence preferences and cell type–specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type–specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species–specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.
AB - The intrinsic DNA sequence preferences and cell type–specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type–specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species–specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.
UR - http://www.scopus.com/inward/record.url?scp=85125683060&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85125683060&partnerID=8YFLogxK
U2 - 10.1101/GR.275394.121
DO - 10.1101/GR.275394.121
M3 - Article
C2 - 35042722
AN - SCOPUS:85125683060
SN - 1088-9051
VL - 32
SP - 512
EP - 523
JO - Genome research
JF - Genome research
IS - 3
ER -