TY - JOUR
T1 - Estimating genetic effects and quantifying missing heritability explained by identified rare-variant associations
AU - Liu, Dajiang J.
AU - Leal, Suzanne M.
N1 - Funding Information:
This research is supported by National Institutes of Health grants MD005964 and HL102926 (S.M.L.). We would like to thank Jonathan Cohen and Helen Hobbs for providing us with data on the ANGTPL family genes from the Dallas Heart Study, which was supported by National Institutes of Health grant RL1HL092550 (J.C.). We would also like to thank Shamil Sunyaev (S.S.) for sharing the simulated genetic datasets from his projects, which were supported by National Institutes of Health grant MH084676 (S.S). Computation for this research was supported in part by the Shared University Grid at Rice, which was funded by the National Science Foundation under grant EIA-0216467, and by a partnership among Rice University, Sun Microsystems, and Sigma Solutions, Inc.
PY - 2012/10/5
Y1 - 2012/10/5
N2 - Next-generation sequencing has led to many complex-trait rare-variant (RV) association studies. Although single-variant association analysis can be performed, it is grossly underpowered. Therefore, researchers have developed many RV association tests that aggregate multiple variant sites across a genetic region (e.g., gene), and test for the association between the trait and the aggregated genotype. After these aggregate tests detect an association, it is only possible to estimate the average genetic effect for a group of RVs. As a result of the "winner's curse," such an estimate can be biased. Although for common variants one can obtain unbiased estimates of genetic parameters by analyzing a replication sample, for RVs it is desirable to obtain unbiased genetic estimates for the study where the association is identified. This is because there can be substantial heterogeneity of RV sites and frequencies even among closely related populations. In order to obtain an unbiased estimate for aggregated RV analysis, we developed bootstrap-sample- split algorithms to reduce the bias of the winner's curse. The unbiased estimates are greatly important for understanding the population-specific contribution of RVs to the heritability of complex traits. We also demonstrate both theoretically and via simulations that for aggregate RV analysis the genetic variance for a gene or region will always be underestimated, sometimes substantially, because of the presence of noncausal variants or because of the presence of causal variants with effects of different magnitudes or directions. Therefore, even if RVs play a major role in the complex-trait etiologies, a portion of the heritability will remain missing, and the contribution of RVs to the complex-trait etiologies will be underestimated.
AB - Next-generation sequencing has led to many complex-trait rare-variant (RV) association studies. Although single-variant association analysis can be performed, it is grossly underpowered. Therefore, researchers have developed many RV association tests that aggregate multiple variant sites across a genetic region (e.g., gene), and test for the association between the trait and the aggregated genotype. After these aggregate tests detect an association, it is only possible to estimate the average genetic effect for a group of RVs. As a result of the "winner's curse," such an estimate can be biased. Although for common variants one can obtain unbiased estimates of genetic parameters by analyzing a replication sample, for RVs it is desirable to obtain unbiased genetic estimates for the study where the association is identified. This is because there can be substantial heterogeneity of RV sites and frequencies even among closely related populations. In order to obtain an unbiased estimate for aggregated RV analysis, we developed bootstrap-sample- split algorithms to reduce the bias of the winner's curse. The unbiased estimates are greatly important for understanding the population-specific contribution of RVs to the heritability of complex traits. We also demonstrate both theoretically and via simulations that for aggregate RV analysis the genetic variance for a gene or region will always be underestimated, sometimes substantially, because of the presence of noncausal variants or because of the presence of causal variants with effects of different magnitudes or directions. Therefore, even if RVs play a major role in the complex-trait etiologies, a portion of the heritability will remain missing, and the contribution of RVs to the complex-trait etiologies will be underestimated.
UR - http://www.scopus.com/inward/record.url?scp=84867249426&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84867249426&partnerID=8YFLogxK
U2 - 10.1016/j.ajhg.2012.08.008
DO - 10.1016/j.ajhg.2012.08.008
M3 - Article
C2 - 23022102
AN - SCOPUS:84867249426
SN - 0002-9297
VL - 91
SP - 585
EP - 596
JO - American Journal of Human Genetics
JF - American Journal of Human Genetics
IS - 4
ER -