TY - JOUR
T1 - Learning the properties of adaptive regions with functional data analysis
AU - Mughal, Mehreen R.
AU - Koch, Hillary
AU - Huang, Jinguo
AU - Chiaromonte, Francesca
AU - DeGiorgio, Michael
N1 - Publisher Copyright:
© 2020 Mughal et al.
PY - 2020/8/27
Y1 - 2020/8/27
N2 - Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.
AB - Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.
UR - http://www.scopus.com/inward/record.url?scp=85090833698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090833698&partnerID=8YFLogxK
U2 - 10.1371/JOURNAL.PGEN.1008896
DO - 10.1371/JOURNAL.PGEN.1008896
M3 - Article
C2 - 32853200
AN - SCOPUS:85090833698
SN - 1553-7390
VL - 16
JO - PLoS genetics
JF - PLoS genetics
IS - 8
M1 - e1008896
ER -