A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data

Carrie C. Buchanan, Eric S. Torstenson, William S. Bush, Marylyn D. Ritchie

Research output: Contribution to journalArticlepeer-review

58 Scopus citations


Background: Since publication of the human genome in 2003, geneticists have been interested in risk variant associations to resolve the etiology of traits and complex diseases. The International HapMap Consortium undertook an effort to catalog all common variation across the genome (variants with a minor allele frequency (MAF) of at least 5% in one or more ethnic groups). HapMap along with advances in genotyping technology led to genome-wide association studies which have identified common variants associated with many traits and diseases. In 2008 the 1000 Genomes Project aimed to sequence 2500 individuals and identify rare variants and 99% of variants with a MAF of <1%. Methods: To determine whether the 1000 Genomes Project includes all the variants in HapMap, we examined the overlap between single nucleotide polymorphisms (SNPs) genotyped in the two resources using merged phase II/III HapMap data and low coverage pilot data from 1000 Genomes. Results: Comparison of the two data sets showed that approximately 72% of HapMap SNPs were also found in 1000 Genomes Project pilot data. After filtering out HapMap variants with a MAF of <5% (separately for each population), 99% of HapMap SNPs were found in 1000 Genomes data. Conclusions: Not all variants cataloged in HapMap are also cataloged in 1000 Genomes. This could affect decisions about which resource to use for SNP queries, rare variant validation, or imputation. Both the HapMap and 1000 Genomes Project databases are useful resources for human genetics, but it is important to understand the assumptions made and filtering strategies employed by these projects.

Original languageEnglish (US)
Pages (from-to)289-294
Number of pages6
JournalJournal of the American Medical Informatics Association
Issue number2
StatePublished - Mar 2012

All Science Journal Classification (ASJC) codes

  • Health Informatics


Dive into the research topics of 'A comparison of cataloged variation between international HapMap consortium and 1000 genomes project data'. Together they form a unique fingerprint.

Cite this