Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data

Yu Zhang, Jing Zhang, Jun S. Liu

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


Interactions among multiple genes across the genome may contribute to the risks of many complex human diseases. Whole-genome single nucleotide polymorphisms (SNPs) data collected for many thousands of SNP markers from thousands of individuals under the case-control design promise to shed light on our understanding of such interactions. However, nearby SNPs are highly correlated due to linkage disequilibrium (LD) and the number of possible interactions is too large for exhaustive evaluation. We propose a novel Bayesian method for simultaneously partitioning SNPs into LD-blocks and selecting SNPs within blocks that are associated with the disease, either individually or interactively with other SNPs. When applied to homogeneous population data, the method gives posterior probabilities for LD-block boundaries, which not only result in accurate block partitions of SNPs, but also provide measures of partition uncertainty. When applied to case-control data for association mapping, the method implicitly filters out SNP associations created merely by LD with disease loci within the same blocks. Simulation study showed that this approach is more powerful in detecting multi-locus associations than other methods we tested, including one of ours. When applied to the WTCCC type 1 diabetes data, the method identified many previously known T1D associated genes, including PTPN22, CTLA4, MHC, and IL2RA. The method also revealed some interesting two-way associations that are undetected by single SNP methods. Most of the significant associations are located within the MHC region. Our analysis showed that the MHC SNPs form long-distance joint associations over several known recombination hotspots. By controlling the haplotypes of the MHC class II region, we identified additional associations in both MHC class I (HLA-A, HLA-B) and class III regions (BAT1). We also observed significant interactions between genes PRSS16, ZNF184 in the extended MHC region and the MHC class II genes. The proposed method can be broadly applied to the classification problem with correlated discrete covariates.

Original languageEnglish (US)
Pages (from-to)2052-2077
Number of pages26
JournalAnnals of Applied Statistics
Issue number3
StatePublished - Sep 2011

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Modeling and Simulation
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Block-based Bayesian epistasis association mapping with application to WTCCC type 1 diabetes data'. Together they form a unique fingerprint.

Cite this