Genome-wide tagging SNPs with entropy-based Monte Carlo method

Zhenqiu Liu, Shili Lin, Ming Tan

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


The number of common single nucleotide polymorphisms (SNPs) in the human genome is estimated to be around 3-6 million. It is highly anticipated that the study of SNPs will help provide a means for elucidating the genetic component of complex diseases and variable drug responses. High-throughput technologies such as oligonucleotide arrays have produced enormous amount of SNP data, which creates great challenges in genome-wide disease linkage and association studies. In this paper, we present an adaptation of the cross entropy (CE) method and propose an iterative CE Monte Carlo (CEMC) algorithm for tagging SNP selection. This differs from most of SNP selection algorithms in the literature in that our method is independent of the notion of haplotype block. Thus, the method is applicable to whole genome SNP selection without prior knowledge of block boundaries. We applied this block-free algorithm to three large datasets (two simulated and one real) that are in the order of thousands of SNPs. The successful applications to these large scale datasets demonstrate that CEMC is computationally feasible for whole genome SNP selection. Furthermore, the results show that CEMC is significantly better than random selection, and it also outperformed another block-free selection algorithm for the dataset considered.

Original languageEnglish (US)
Pages (from-to)1606-1614
Number of pages9
JournalJournal of Computational Biology
Issue number9
StatePublished - Nov 2006

All Science Journal Classification (ASJC) codes

  • Modeling and Simulation
  • Molecular Biology
  • Genetics
  • Computational Mathematics
  • Computational Theory and Mathematics


Dive into the research topics of 'Genome-wide tagging SNPs with entropy-based Monte Carlo method'. Together they form a unique fingerprint.

Cite this