TY - JOUR
T1 - A rarefaction approach for measuring population differences in rare and common variation
AU - Cotter, Daniel J.
AU - Hofgard, Elyssa F.
AU - Novembre, John
AU - Szpiech, Zachary A.
AU - Rosenberg, Noah A.
N1 - Publisher Copyright:
© The Author(s) 2023. Published by Oxford University Press on behalf of The Genetics Society of America.
PY - 2023/6
Y1 - 2023/6
N2 - In studying allele-frequency variation across populations, it is often convenient to classify an allelic type as “rare,” with nonzero frequency less than or equal to a specified threshold, “common,” with a frequency above the threshold, or entirely unobserved in a population. When sample sizes differ across populations, however, especially if the threshold separating “rare” and “common” corresponds to a small number of observed copies of an allelic type, discreteness effects can lead a sample from one population to possess substantially more rare allelic types than a sample from another population, even if the two populations have extremely similar underlying allele-frequency distributions across loci. We introduce a rarefaction-based sample-size correction for use in comparing rare and common variation across multiple populations whose sample sizes potentially differ. We use our approach to examine rare and common variation in worldwide human populations, finding that the sample-size correction introduces subtle differences relative to analyses that use the full available sample sizes. We introduce several ways in which the rarefaction approach can be applied: we explore the dependence of allele classifications on subsample sizes, we permit more than two classes of allelic types of nonzero frequency, and we analyze rare and common variation in sliding windows along the genome. The results can assist in clarifying similarities and differences in allele-frequency patterns across populations.
AB - In studying allele-frequency variation across populations, it is often convenient to classify an allelic type as “rare,” with nonzero frequency less than or equal to a specified threshold, “common,” with a frequency above the threshold, or entirely unobserved in a population. When sample sizes differ across populations, however, especially if the threshold separating “rare” and “common” corresponds to a small number of observed copies of an allelic type, discreteness effects can lead a sample from one population to possess substantially more rare allelic types than a sample from another population, even if the two populations have extremely similar underlying allele-frequency distributions across loci. We introduce a rarefaction-based sample-size correction for use in comparing rare and common variation across multiple populations whose sample sizes potentially differ. We use our approach to examine rare and common variation in worldwide human populations, finding that the sample-size correction introduces subtle differences relative to analyses that use the full available sample sizes. We introduce several ways in which the rarefaction approach can be applied: we explore the dependence of allele classifications on subsample sizes, we permit more than two classes of allelic types of nonzero frequency, and we analyze rare and common variation in sliding windows along the genome. The results can assist in clarifying similarities and differences in allele-frequency patterns across populations.
UR - http://www.scopus.com/inward/record.url?scp=85160455736&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160455736&partnerID=8YFLogxK
U2 - 10.1093/genetics/iyad070
DO - 10.1093/genetics/iyad070
M3 - Article
C2 - 37075098
AN - SCOPUS:85160455736
SN - 0016-6731
VL - 224
JO - Genetics
JF - Genetics
IS - 2
M1 - iyad070
ER -