Linkage disequilibrium (LD) plays a central role in fine mapping of disease genes and, more recently, in characterizing haplotype blocks. Classical LD measures, such as D′ and r2, are frequently used to quantify relationship between two loci. A pairwise "distance" matrix among a set of loci can be constructed using such a measure, and based upon which a number of haplotype block detection and tagging single nucleotide polymorphism (SNP) selection algorithms have been devised. Although successful in many applications, the pairwise nature of these measures does not provide a direct characterization of joint linkage disequilibrium among multiple loci. Consequently, applications based on them may lead to loss of important information. In this report, we propose a multilocus LD measure based on generalized mutual information, which is also known as relative entropy or Kullback-Leibler distance. In essence, this measure seeks to quantify the distance between the observed haplotype distribution and the expected distribution assuming linkage equilibrium. We can show that this measure is approximately equal to r2 in the special case with two loci. Based on this multilocus LD measure and an entropy measure that characterizes haplotype diversity, we propose a class of stepwise tagging SNP selection algorithms. This represents a unified approach for SNP selection in that it takes into account both the haplotype diversity and linkage disequilibrium objectives. Applications to both simulated and real data demonstrate the utility of the proposed methods for handling a large number of SNPs. The results indicate that multilocus LD patterns can be captured well, and informative and nonredundant SNPs can be selected effectively from a large set of loci.
All Science Journal Classification (ASJC) codes