TY - JOUR
T1 - Distinguishing regulatory DNA from neutral sites.
AU - Elnitski, Laura
AU - Hardison, Ross C.
AU - Li, Jia
AU - Yang, Shan
AU - Kolbe, Diana
AU - Eswara, Pallavi
AU - O'Connor, Michael J.
AU - Schwartz, Scott
AU - Miller, Webb
AU - Chiaromonte, Francesca
PY - 2003/1
Y1 - 2003/1
N2 - We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human-mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human-mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.
AB - We explore several computational approaches to analyzing interspecies genomic sequence alignments, aiming to distinguish regulatory regions from neutrally evolving DNA. Human-mouse genomic alignments were collected for three sets of human regions: (1) experimentally defined gene regulatory regions, (2) well-characterized exons (coding sequences, as a positive control), and (3) interspersed repeats thought to have inserted before the human-mouse split (a good model for neutrally evolving DNA). Models that potentially could distinguish functional noncoding sequences from neutral DNA were evaluated on these three data sets, as well as bulk genome alignments. Our analyses show that discrimination based on frequencies of individual nucleotide pairs or gaps (i.e., of possible alignment columns) is only partially successful. In contrast, scoring procedures that include the alignment context, based on frequencies of short runs of alignment columns, dramatically improve separation between regulatory and neutral features. Such scoring functions should aid in the identification of putative regulatory regions throughout the human genome.
UR - http://www.scopus.com/inward/record.url?scp=18344415808&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=18344415808&partnerID=8YFLogxK
U2 - 10.1101/gr.817703
DO - 10.1101/gr.817703
M3 - Article
C2 - 12529307
AN - SCOPUS:18344415808
SN - 1088-9051
VL - 13
SP - 64
EP - 72
JO - Genome research
JF - Genome research
IS - 1
ER -