TY - JOUR
T1 - Robust genome-wide ancestry inference for heterogeneous datasets
T2 - illustrated using the 1,000 genome project with 3D facial images
AU - Li, Jiarui
AU - Zarzar, Tomás González
AU - White, Julie D.
AU - Indencleef, Karlijne
AU - Hoskens, Hanne
AU - Matthews, Harry
AU - Nauwelaers, Nele
AU - Zaidi, Arslan
AU - Eller, Ryan J.
AU - Herrick, Noah
AU - Günther, Torsten
AU - Svensson, Emma M.
AU - Jakobsson, Mattias
AU - Walsh, Susan
AU - Van Steen, Kristel
AU - Shriver, Mark D.
AU - Claes, Peter
N1 - Funding Information:
Jiarui Li is in part supported by a Chinese Research PhD scholarship. This investigation was also supported by the KU Leuven, BOF, FWO Flanders (G078518N) and the NIH (1-RO1-DE027023). Kristel Van Steen acknowledges research opportunities offered by F.N.R.S (Convention no. T.0180.13) and by the interuniversity research institute Walloon Excellence in Lifesciences and BIOtechnology (WELBIO—Convention de Recherche no. WELBIO-CR-2015S-03R). The collaborators at the Penn State University were supported in part by Grants from the Center for Human Evolution and Development at Penn State, the Science Foundation of Ireland Walton Fellowship (04. W4/B643), the United States National Institute Justice (https://www.nij.gov; 2008-DN-BX-K125), and by the United States Department of Defense (https://www.defense.gov). Torsten Günther is supported by the Swedish Research Council (2017-05267).
Publisher Copyright:
© 2020, The Author(s).
PY - 2020/12/1
Y1 - 2020/12/1
N2 - Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively.
AB - Estimates of individual-level genomic ancestry are routinely used in human genetics, and related fields. The analysis of population structure and genomic ancestry can yield insights in terms of modern and ancient populations, allowing us to address questions regarding admixture, and the numbers and identities of the parental source populations. Unrecognized population structure is also an important confounder to correct for in genome-wide association studies. However, it remains challenging to work with heterogeneous datasets from multiple studies collected by different laboratories with diverse genotyping and imputation protocols. This work presents a new approach and an accompanying open-source toolbox that facilitates a robust integrative analysis for population structure and genomic ancestry estimates for heterogeneous datasets. We show robustness against individual outliers and different protocols for the projection of new samples into a reference ancestry space, and the ability to reveal and adjust for population structure in a simulated case–control admixed population. Given that visually evident and easily recognizable patterns of human facial characteristics co-vary with genomic ancestry, and based on the integration of three different sources of genome data, we generate average 3D faces to illustrate genomic ancestry variations within the 1,000 Genome project and for eight ancient-DNA profiles, respectively.
UR - http://www.scopus.com/inward/record.url?scp=85088120743&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088120743&partnerID=8YFLogxK
U2 - 10.1038/s41598-020-68259-w
DO - 10.1038/s41598-020-68259-w
M3 - Article
C2 - 32678112
AN - SCOPUS:85088120743
SN - 2045-2322
VL - 10
JO - Scientific reports
JF - Scientific reports
IS - 1
M1 - 11850
ER -