TY - JOUR
T1 - Comparing spatial maps of human population-genetic variation using procrustes analysis
AU - Wang, Chaolong
AU - Szpiech, Zachary A.
AU - Degnan, James H.
AU - Jakobsson, Mattias
AU - Pemberton, Trevor J.
AU - Hardy, John A.
AU - Singleton, Andrew B.
AU - Rosenberg, Noah A.
N1 - Funding Information:
KEYWORDS: multidimensional scaling, population genetics, principal components analysis, Procrustes analysis Author Notes: We are grateful to J. Akey and J. Novembre for assistance with the data from their papers. We thank T. Jombart and an anonymous reviewer for comments on the manuscript. This work was supported in part by NIH grants R01 GM081441 and T32 GM070449, by a Burroughs Wellcome Fund Career Award in the Biomedical Sciences, by an Alfred P. Sloan Research Fellowship, and by the Intramural Research Program of the National Institute on Aging, National Institutes of Health, Department of Health and Human Services (project number Z01-AG000932-02).
PY - 2010
Y1 - 2010
N2 - Recent applications of principal components analysis (PCA) and multidimensional scaling (MDS) in human population genetics have found that "statistical maps" based on the genotypes in population-genetic samples often resemble geographic maps of the underlying sampling locations. To provide formal tests of these qualitative observations, we describe a Procrustes analysis approach for quantitatively assessing the similarity of population-genetic and geographic maps. We confirm in two scenarios, one using single-nucleotide polymorphism (SNP) data from Europe and one using SNP data worldwide, that a measurably high level of concordance exists between statistical maps of population-genetic variation and geographic maps of sampling locations. Two other examples illustrate the versatility of the Procrustes approach in population-genetic applications, verifying the concordance of SNP analyses using PCA and MDS, and showing that statistical maps of worldwide copy-number variants (CNVs) accord with statistical maps of SNP variation, especially when CNV analysis is limited to samples with the highest-quality data. As statistical maps with PCA and MDS have become increasingly common for use in summarizing population relationships, our examples highlight the potential of Procrustes-based quantitative comparisons for interpreting the results in these maps.
AB - Recent applications of principal components analysis (PCA) and multidimensional scaling (MDS) in human population genetics have found that "statistical maps" based on the genotypes in population-genetic samples often resemble geographic maps of the underlying sampling locations. To provide formal tests of these qualitative observations, we describe a Procrustes analysis approach for quantitatively assessing the similarity of population-genetic and geographic maps. We confirm in two scenarios, one using single-nucleotide polymorphism (SNP) data from Europe and one using SNP data worldwide, that a measurably high level of concordance exists between statistical maps of population-genetic variation and geographic maps of sampling locations. Two other examples illustrate the versatility of the Procrustes approach in population-genetic applications, verifying the concordance of SNP analyses using PCA and MDS, and showing that statistical maps of worldwide copy-number variants (CNVs) accord with statistical maps of SNP variation, especially when CNV analysis is limited to samples with the highest-quality data. As statistical maps with PCA and MDS have become increasingly common for use in summarizing population relationships, our examples highlight the potential of Procrustes-based quantitative comparisons for interpreting the results in these maps.
UR - http://www.scopus.com/inward/record.url?scp=77249170740&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77249170740&partnerID=8YFLogxK
U2 - 10.2202/1544-6115.1493
DO - 10.2202/1544-6115.1493
M3 - Article
C2 - 20196748
AN - SCOPUS:77249170740
SN - 1544-6115
VL - 9
JO - Statistical Applications in Genetics and Molecular Biology
JF - Statistical Applications in Genetics and Molecular Biology
IS - 1
M1 - 13
ER -