TY - GEN
T1 - Accurate Detection of Tandem Repeats from Error-Prone Sequences with EquiRep
AU - Song, Zhezheng
AU - Zahin, Tasfia
AU - Li, Xiang
AU - Shao, Mingfu
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Several critical tasks in biology such as detecting tandem repeats from error-prone long reads and reconstructing circular RNAs from rolling circle long-reads data, require solving a fundamental computational problem: given a sequence containing an unknown number of mutated copies of an unknown repeat unit, reconstruct the original unit. While several methods exist for this problem, they often exhibit low accuracy when the repeat unit length increases or the number of copies is low. Furthermore, methods capable of handling highly mutated sequences remain scarce, highlighting significant need for improvement. We introduce EquiRep, a tool for accurate detection of tandem repeats from error-prone sequences. By evaluating using simulated and real datasets we show that EquiRep consistently outperforms state-of-the-art methods. EquiRep is robust to sequencing errors, and is able to make better predictions for long units and low frequencies, underscoring its broad usability. EquiRep is freely available at https://github.com/Shao-Group/EquiRep. The full version of this manuscript is available at https://doi.org/10.1101/2024.11.05.621953.
AB - Several critical tasks in biology such as detecting tandem repeats from error-prone long reads and reconstructing circular RNAs from rolling circle long-reads data, require solving a fundamental computational problem: given a sequence containing an unknown number of mutated copies of an unknown repeat unit, reconstruct the original unit. While several methods exist for this problem, they often exhibit low accuracy when the repeat unit length increases or the number of copies is low. Furthermore, methods capable of handling highly mutated sequences remain scarce, highlighting significant need for improvement. We introduce EquiRep, a tool for accurate detection of tandem repeats from error-prone sequences. By evaluating using simulated and real datasets we show that EquiRep consistently outperforms state-of-the-art methods. EquiRep is robust to sequencing errors, and is able to make better predictions for long units and low frequencies, underscoring its broad usability. EquiRep is freely available at https://github.com/Shao-Group/EquiRep. The full version of this manuscript is available at https://doi.org/10.1101/2024.11.05.621953.
UR - https://www.scopus.com/pages/publications/105004253727
UR - https://www.scopus.com/pages/publications/105004253727#tab=citedBy
U2 - 10.1007/978-3-031-90252-9_46
DO - 10.1007/978-3-031-90252-9_46
M3 - Conference contribution
AN - SCOPUS:105004253727
SN - 9783031902512
T3 - Lecture Notes in Computer Science
SP - 390
EP - 394
BT - Research in Computational Molecular Biology - 29th International Conference, RECOMB 2025, Proceedings
A2 - Sankararaman, Sriram
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on Research in Computational Molecular Biology, RECOMB 2025
Y2 - 26 April 2025 through 29 April 2025
ER -