TY - JOUR
T1 - Densities, length proportions, and other distributional features of repetitive sequences in the human genome estimated from 430 megabases of genomic sequence
AU - Gu, Zhenglong
AU - Wang, Haidong
AU - Nekrutenko, Anton
AU - Li, Wen Hsiung
N1 - Funding Information:
We are grateful to Dr. Arian Smit for providing the current version of the Repeat Masker and for discussions. Extensive help from Dr. Xuhua Xia is appreciated. The data provided by the Oakridge National Laboratory Computational Biology group greatly facilitated this study. Thanks to Richard Blocker for the UNIX system management and Dr. Richard Hudson for generous permission to use his computational facilities. We thank Dr. Satoshi Ota for discussions. This study was supported by NIH grants GM30998 and GM55759.
PY - 2000/12/23
Y1 - 2000/12/23
N2 - The densities of repetitive elements in the human genome were calculated in each GC content class using non-overlapping windows of 50kb. The density of Alu is two to three times higher in GC-rich regions than in AT-rich regions, while the opposite is true for LINE1. In contrast, LINE2 and other elements, such as DNA transposons, are more uniformly distributed in the genome. The number of Alus in the human genome was estimated to be 1.4 million, higher than previous estimates. About 40% of the autosomes and ∼51% of the X and Y chromosomes are occupied by repetitive elements. In total, the human genome is estimated to contain more than 4 million repetitive elements. The GC contents (%) of repetitive elements and their flanking regions were also calculated. The GC contents of almost all kinds of repeats are positively correlated with the window GC contents, suggesting that a repetitive sequence is subject to the same mutation pressure as its surrounding regions, so it tends to have the same GC content as its surrounding regions. This observation supports the regional mutation hypothesis. The only two exceptions are AluYa and AluYb8, the two youngest Alu subfamilies. The GC content of AluYb8 is negatively correlated with that of its surrounding regions, while AluYa shows no correlation, suggesting different insertion patterns for these two young Alu subfamilies. This suggestion was supported by the fact that the average genetic distance between members of AluYb8 in each GC window class is positively correlated with the GC content of the window, but no correlation was found for AluYa. AluYa is more frequent in Y chromosome than in other chromosomes; the same is true for LTR retroviruses. This pattern might be correlated with the evolutionary history of Y chromosome.
AB - The densities of repetitive elements in the human genome were calculated in each GC content class using non-overlapping windows of 50kb. The density of Alu is two to three times higher in GC-rich regions than in AT-rich regions, while the opposite is true for LINE1. In contrast, LINE2 and other elements, such as DNA transposons, are more uniformly distributed in the genome. The number of Alus in the human genome was estimated to be 1.4 million, higher than previous estimates. About 40% of the autosomes and ∼51% of the X and Y chromosomes are occupied by repetitive elements. In total, the human genome is estimated to contain more than 4 million repetitive elements. The GC contents (%) of repetitive elements and their flanking regions were also calculated. The GC contents of almost all kinds of repeats are positively correlated with the window GC contents, suggesting that a repetitive sequence is subject to the same mutation pressure as its surrounding regions, so it tends to have the same GC content as its surrounding regions. This observation supports the regional mutation hypothesis. The only two exceptions are AluYa and AluYb8, the two youngest Alu subfamilies. The GC content of AluYb8 is negatively correlated with that of its surrounding regions, while AluYa shows no correlation, suggesting different insertion patterns for these two young Alu subfamilies. This suggestion was supported by the fact that the average genetic distance between members of AluYb8 in each GC window class is positively correlated with the GC content of the window, but no correlation was found for AluYa. AluYa is more frequent in Y chromosome than in other chromosomes; the same is true for LTR retroviruses. This pattern might be correlated with the evolutionary history of Y chromosome.
UR - http://www.scopus.com/inward/record.url?scp=0034707206&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0034707206&partnerID=8YFLogxK
U2 - 10.1016/S0378-1119(00)00434-0
DO - 10.1016/S0378-1119(00)00434-0
M3 - Article
C2 - 11163965
AN - SCOPUS:0034707206
SN - 0378-1119
VL - 259
SP - 81
EP - 88
JO - Gene
JF - Gene
IS - 1-2
ER -