We analyze the histograms for the lengths of the 16 possible distinct repeats of identical dimers, known as dimeric tandem repeats, in DNA sequences. For coding regions, the probability of finding a repetitive sequence of l copies of a particular dimer decreases exponentially as l increases. For the noncoding regions, the distribution functions for most of the 16 dimers have long tails and can be approximated by power-law functions, while for coding DNA, they can be well fit by a first-order Markov process. We propose a model, based on known biophysical processes, which leads to the observed probability distribution functions for noncoding DNA. We argue that this difference in the shape of the distribution functions between coding and noncoding DNA arises from the fact that noncoding DNA is more tolerant to evolutionary mutational alterations than coding DNA.

Original languageEnglish (US)
Pages (from-to)5182-5185
Number of pages4
JournalPhysical review letters
Issue number25
StatePublished - Jan 1 1997

All Science Journal Classification (ASJC) codes

  • General Physics and Astronomy


Dive into the research topics of 'Distribution of base pair repeats in coding and noncoding DNA sequences'. Together they form a unique fingerprint.

Cite this