Quality control procedures for genome-wide association studies

Stephen Turner, Loren L. Armstrong, Yuki Bradford, Christopher S. Carlsony, Dana C. Crawford, Andrew T. Crenshaw, Mariza de Andrade, Kimberly F. Doheny, Jonathan L. Haines, Geoffrey Hayes, Gail Jarvik, Lan Jiang, Iftikhar J. Kullo, Rongling Li, Hua Ling, Teri A. Manolio, Martha M. Matsumoto, Catherine A. McCarty, Andrew N. McDavid, Daniel B. MirelJustin E. Paschall, Elizabeth W. Pugh, Luke V. Rasmussen, Russell A. Wilke, Rebecca L. Zuvich, Marylyn D. Ritchie

Research output: Contribution to journalArticlepeer-review

258 Scopus citations


Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.

Original languageEnglish (US)
Article number1.19
JournalCurrent protocols in human genetics
Issue numberSUPPL.68
StatePublished - Jan 2011

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)


Dive into the research topics of 'Quality control procedures for genome-wide association studies'. Together they form a unique fingerprint.

Cite this