TY - JOUR
T1 - An exact nonparametric method for inferring mosaic structure in sequence triplets
AU - Boni, Maciej F.
AU - Posada, David
AU - Feldman, Marcus W.
PY - 2007/6
Y1 - 2007/6
N2 - Statistical tests for detecting mosaic structure or recombination among nucleotide sequences usually rely on identifying a pattern or a signal that would be unlikely to appear under clonal reproduction. Dozens of such tests have been described, but many are hampered by long running times, confounding of selection and recombination, and/or inability to isolate the mosaic-producing event. We introduce a test that is exact, nonparametric, rapidly computable, free of the infinite-sites assumption, able to distinguish between recombination and variation in mutation/fixation rates, and able to identify the breakpoints and sequences involved in the mosaic-producing event. Our test considers three sequences at a time: two parent sequences that may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; we take the maximum value of this excess similarity as our test statistic Δm,n,b. We present a method for rapidly calculating the distribution of Δm,n,b and demonstrate that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.
AB - Statistical tests for detecting mosaic structure or recombination among nucleotide sequences usually rely on identifying a pattern or a signal that would be unlikely to appear under clonal reproduction. Dozens of such tests have been described, but many are hampered by long running times, confounding of selection and recombination, and/or inability to isolate the mosaic-producing event. We introduce a test that is exact, nonparametric, rapidly computable, free of the infinite-sites assumption, able to distinguish between recombination and variation in mutation/fixation rates, and able to identify the breakpoints and sequences involved in the mosaic-producing event. Our test considers three sequences at a time: two parent sequences that may have recombined, with one or two breakpoints, to form the third sequence (the child sequence). Excess similarity of the child sequence to a candidate recombinant of the parents is a sign of recombination; we take the maximum value of this excess similarity as our test statistic Δm,n,b. We present a method for rapidly calculating the distribution of Δm,n,b and demonstrate that it has comparable power to and a much improved running time over previous methods, especially in detecting recombination in large data sets.
UR - http://www.scopus.com/inward/record.url?scp=34250757252&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250757252&partnerID=8YFLogxK
U2 - 10.1534/genetics.106.068874
DO - 10.1534/genetics.106.068874
M3 - Article
C2 - 17409078
AN - SCOPUS:34250757252
SN - 0016-6731
VL - 176
SP - 1035
EP - 1047
JO - Genetics
JF - Genetics
IS - 2
ER -