A k-mer-based search engine for sequencing databases

Project: Research project

Project Details

Description

Databases with biological sequencing data hold a treasure trove of biological data that can be used to aid experimental design and find relevant prior experiments for new biological projects. However, these rapidly growing archives are heavily under-utilized due to our inability to rapidly query the raw data within them. Just as search engines transformed our ability to broadly and deeply access information online, search indices have the potential to revolutionize the ways in which sequencing data in these archives is used. In this project methods to enable fast and easy access to these databases will be developed. This will contribute to the 'Googlification' of life-science data, spurring broad scientific advances. Additionally, as part of the project, a 'Writing in CS' course as well as an exercise booklet for probabilistic analysis of algorithms will be developed. Workshops on emerging methods for sequence analysis will be organized, and training to underrepresented undergraduates, graduate students and postdocs will be provided.

This project advances research across all areas of life science that work with sequencing data. It creates

powerful indexing data structures and querying algorithms for databases of sequencing experiments. This can allow biologists to query sequencing databases to find experiments which express a certain transcript of interest, show differential levels of expression between two transcripts, contain a known splice junction or gene fusion, or contain a small genome of interest. This project will facilitate a biologist to be able to execute many biologically-stated queries on a database of raw DNA and RNA sequencing experiments. The results of this project will be available on http://medvedevgroup.com/nsf-iibr-project.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusActive
Effective start/end date9/1/228/31/25

Funding

  • National Science Foundation: $300,570.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.