TY - JOUR
T1 - Dissecting Genomic Determinants of Positive Selection with an Evolution-Guided Regression Model
AU - Huang, Yi Fei
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
PY - 2022/1/1
Y1 - 2022/1/1
N2 - In evolutionary genomics, it is fundamentally important to understand how characteristics of genomic sequences, such as gene expression level, determine the rate of adaptive evolution. While numerous statistical methods, such as the McDonald-Kreitman (MK) test, are available to examine the association between genomic features and the rate of adaptation, we currently lack a statistical approach to disentangle the independent effect of a genomic feature from the effects of other correlated genomic features. To address this problem, I present a novel statistical model, the MK regression, which augments the MK test with a generalized linear model. Analogous to the classical multiple regression model, the MK regression can analyze multiple genomic features simultaneously to infer the independent effect of a genomic feature, holding constant all other genomic features. Using the MK regression, I identify numerous genomic features driving positive selection in chimpanzees. These features include well-known ones, such as local mutation rate, residue exposure level, tissue specificity, and immune genes, as well as new features not previously reported, such as gene expression level and metabolic genes. In particular, I show that highly expressed genes may have a higher adaptation rate than their weakly expressed counterparts, even though a higher expression level may impose stronger negative selection. Also, I show that metabolic genes may have a higher adaptation rate than their nonmetabolic counterparts, possibly due to recent changes in diet in primate evolution. Overall, the MK regression is a powerful approach to elucidate the genomic basis of adaptation.
AB - In evolutionary genomics, it is fundamentally important to understand how characteristics of genomic sequences, such as gene expression level, determine the rate of adaptive evolution. While numerous statistical methods, such as the McDonald-Kreitman (MK) test, are available to examine the association between genomic features and the rate of adaptation, we currently lack a statistical approach to disentangle the independent effect of a genomic feature from the effects of other correlated genomic features. To address this problem, I present a novel statistical model, the MK regression, which augments the MK test with a generalized linear model. Analogous to the classical multiple regression model, the MK regression can analyze multiple genomic features simultaneously to infer the independent effect of a genomic feature, holding constant all other genomic features. Using the MK regression, I identify numerous genomic features driving positive selection in chimpanzees. These features include well-known ones, such as local mutation rate, residue exposure level, tissue specificity, and immune genes, as well as new features not previously reported, such as gene expression level and metabolic genes. In particular, I show that highly expressed genes may have a higher adaptation rate than their weakly expressed counterparts, even though a higher expression level may impose stronger negative selection. Also, I show that metabolic genes may have a higher adaptation rate than their nonmetabolic counterparts, possibly due to recent changes in diet in primate evolution. Overall, the MK regression is a powerful approach to elucidate the genomic basis of adaptation.
UR - http://www.scopus.com/inward/record.url?scp=85123649955&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123649955&partnerID=8YFLogxK
U2 - 10.1093/molbev/msab291
DO - 10.1093/molbev/msab291
M3 - Article
C2 - 34597406
AN - SCOPUS:85123649955
SN - 0737-4038
VL - 39
JO - Molecular biology and evolution
JF - Molecular biology and evolution
IS - 1
M1 - msab291
ER -