Project Details
Description
Databases with biological sequencing data hold a treasure trove of biological data that can be used to aid experimental design and find relevant prior experiments for new biological projects. However, these rapidly growing archives are heavily under-utilized due to our inability to rapidly query the raw data within them. Just as search engines transformed our ability to broadly and deeply access information online, search indices have the potential to revolutionize the ways in which sequencing data in these archives is used. In this project methods to enable fast and easy access to these databases will be developed. This will contribute to the 'Googlification' of life-science data, spurring broad scientific advances. Additionally, as part of the project, a 'Writing in CS' course as well as an exercise booklet for probabilistic analysis of algorithms will be developed. Workshops on emerging methods for sequence analysis will be organized, and training to underrepresented undergraduates, graduate students and postdocs will be provided.
This project advances research across all areas of life science that work with sequencing data. It creates
powerful indexing data structures and querying algorithms for databases of sequencing experiments. This can allow biologists to query sequencing databases to find experiments which express a certain transcript of interest, show differential levels of expression between two transcripts, contain a known splice junction or gene fusion, or contain a small genome of interest. This project will facilitate a biologist to be able to execute many biologically-stated queries on a database of raw DNA and RNA sequencing experiments. The results of this project will be available on http://medvedevgroup.com/nsf-iibr-project.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
Status | Active |
---|---|
Effective start/end date | 9/1/22 → 8/31/25 |
Funding
- National Science Foundation: $300,570.00