Allsome sequence bloom trees

Chen Sun, Robert S. Harris, Rayan Chikhi, Paul Medvedev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Scopus citations

Abstract

The ubiquity of next generation sequencing has transformed the size and nature of many databases, pushing the boundaries of current indexing and searching methods. One particular example is a database of 2,652 human RNA-seq experiments uploaded to the Sequence Read Archive. Recently, Solomon and Kingsford proposed the Sequence Bloom Tree data structure and demonstrated how it can be used to accurately identify SRA samples that have a transcript of interest potentially expressed. In this paper, we propose an improvement called the AllSome Sequence Bloom Tree. Results show that our new data structure significantly improves performance, reducing the tree construction time by 52.7% and query time by 39–85%, with a price of up to 3x memory consumption during queries. Notably, it can query a batch of 198,074 queries in under 8 h (compared to around two days previously) and a whole set of k-mers from a sequencing experiment (about 27 mil k-mers) in under 11 min.

Original languageEnglish (US)
Title of host publicationResearch in Computational Molecular Biology - 21st Annual International Conference, RECOMB 2017, Proceedings
EditorsS.Cenk Sahinalp
PublisherSpringer Verlag
Pages272-286
Number of pages15
ISBN (Print)9783319569697
DOIs
StatePublished - 2017
Event21st Annual International Conference on Research in Computational Molecular Biology, RECOMB 2017 - Hong Kong, China
Duration: May 3 2017May 7 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10229 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other21st Annual International Conference on Research in Computational Molecular Biology, RECOMB 2017
Country/TerritoryChina
CityHong Kong
Period5/3/175/7/17

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Allsome sequence bloom trees'. Together they form a unique fingerprint.

Cite this