TY - JOUR
T1 - Distributions of dimeric tandem repeats in non-coding and coding DNA sequences
AU - Dokholyan, Nikolay V.
AU - Buldyrev, Sergey V.
AU - Havlin, Shlomo
AU - Stanley, H. Eugene
N1 - Funding Information:
We would like to thank R. S. Dokholyan, Dr M. Frank-Kamenetskii, Dr C.-K. Peng, R. Stanley, Dr G. H. Weiss, and Dr R. Wells for fruitful discussions. This work is supported by NIH-HGP, N.V.D.
Funding Information:
acknowledges support by NIH NRSA molecular biophysics predoctoral traineeship GM08291-09 and by NIH postdoctoral fellowship 1F32 GM20251-01.
PY - 2000/2/21
Y1 - 2000/2/21
N2 - We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer. (C) 2000 Academic Press.
AB - We study the length distribution functions for the 16 possible distinct dimeric tandem repeats in DNA sequences of diverse taxonomic partitions of GenBank (known human and mouse genomes, and complete genomes of Caenorhabditis elegans and yeast). For coding DNA, we find that all 16 distribution functions are exponential. For non-coding DNA, the distribution functions for most of the dimeric repeats have surprisingly long tails, that fit a power-law function. We hypothesize that: (i) the exponential distributions of dimeric repeats in protein coding sequences indicate strong evolutionary pressure against tandem repeat expansion in coding DNA sequences; and (ii) long tails in the distributions of dimers in non-coding DNA may be a result of various mutational mechanisms. These long, non-exponential tails in the distribution of dimeric repeats in non-coding DNA are hypothesized to be due to the higher tolerance of non-coding DNA to mutations. By comparing genomes of various phylogenetic types of organisms, we find that the shapes of the distributions are not universal, but rather depend on the specific class of species and the type of a dimer. (C) 2000 Academic Press.
UR - https://www.scopus.com/pages/publications/0034695931
UR - https://www.scopus.com/pages/publications/0034695931#tab=citedBy
U2 - 10.1006/jtbi.1999.1052
DO - 10.1006/jtbi.1999.1052
M3 - Article
C2 - 10666360
AN - SCOPUS:0034695931
SN - 0022-5193
VL - 202
SP - 273
EP - 282
JO - Journal of Theoretical Biology
JF - Journal of Theoretical Biology
IS - 4
ER -