TY - JOUR
T1 - Improved Algorithmic Complexity for the 3SEQ Recombination Detection Algorithm
AU - Lam, Ha Minh
AU - Ratmann, Oliver
AU - Boni, Maciej F.
N1 - Publisher Copyright:
© The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
PY - 2018/1/1
Y1 - 2018/1/1
N2 - Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn 3) to O(mn 2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.
AB - Identifying recombinant sequences in an era of large genomic databases is challenging as it requires an efficient algorithm to identify candidate recombinants and parents, as well as appropriate statistical methods to correct for the large number of comparisons performed. In 2007, a computation was introduced for an exact nonparametric mosaicism statistic that gave high-precision P values for putative recombinants. This exact computation meant that multiple-comparisons corrected P values also had high precision, which is crucial when performing millions or billions of tests in large databases. Here, we introduce an improvement to the algorithmic complexity of this computation from O(mn 3) to O(mn 2), where m and n are the numbers of recombination-informative sites in the candidate recombinant. This new computation allows for recombination analysis to be performed in alignments with thousands of polymorphic sites. Benchmark runs are presented on viral genome sequence alignments, new features are introduced, and applications outside recombination analysis are discussed.
UR - http://www.scopus.com/inward/record.url?scp=85040551882&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040551882&partnerID=8YFLogxK
U2 - 10.1093/molbev/msx263
DO - 10.1093/molbev/msx263
M3 - Article
C2 - 29029186
AN - SCOPUS:85040551882
SN - 0737-4038
VL - 35
SP - 247
EP - 251
JO - Molecular biology and evolution
JF - Molecular biology and evolution
IS - 1
ER -