TY - GEN
T1 - Allsome sequence bloom trees
AU - Sun, Chen
AU - Harris, Robert S.
AU - Chikhi, Rayan
AU - Medvedev, Paul
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39–85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 h (compared to around two days previously) and a whole set of k-mers from a sequencing experiment (about 27 mil k-mers) in under 11 min.
AB - The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39–85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 h (compared to around two days previously) and a whole set of k-mers from a sequencing experiment (about 27 mil k-mers) in under 11 min.
UR - http://www.scopus.com/inward/record.url?scp=85018408000&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85018408000&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-56970-3_17
DO - 10.1007/978-3-319-56970-3_17
M3 - Conference contribution
AN - SCOPUS:85018408000
SN - 9783319569697
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 272
EP - 286
BT - Research in Computational Molecular Biology - 21st Annual International Conference, RECOMB 2017, Proceedings
A2 - Sahinalp, S.Cenk
PB - Springer Verlag
T2 - 21st Annual International Conference on Research in Computational Molecular Biology, RECOMB 2017
Y2 - 3 May 2017 through 7 May 2017
ER -