Skip to main navigation Skip to search Skip to main content

XPS: FULL: DSD: End-to-end Acceleration of Genomic Workflows on Emerging Heterogeneous Supercomputers

Project: Research project

Project Details

Description

The proposed research harnesses parallelism to accelerate the pervasive bioinformatics workflow of detecting genetic variations. This workflow determines the genetic variants present in an individual, given DNA sequencing data. The variant detection workflow is an integral part of current genomic data analysis, and several studies have linked genetic variants to diseases. Typical instances of this workflow currently take several hours to multiple days to complete with state-of-the-art software, and current algorithms and software are unable to exploit and benefit from even modest levels of hardware parallelism. Most prior approaches to parallelization and performance tuning of genomic data analysis pipelines have targeted computation, I/O, or network data transfer bottlenecks in isolation, and consequently, are limited in the overall performance improvement they can achieve. This project targets end-to-end acceleration methodologies and uses emerging heterogeneous supercomputers to reduce workflow time-to-completion. The project focuses on holistic methodologies to accelerate multiple components within the genetic variant detection workflow. It explores lightweight data reorganizations at multiple granularities to enhance locality, investigates compute-, communication-, and I/O task cotuning, locality-aware load-balancing, and coordinated resource partitioning to exploit high-performance computing platforms. A key goal of the proposed research is to design domain-specific optimizations targeting the massive parallelism and scalability potential of current heterogeneous supercomputers, so that the developed techniques can be easily transferred and applied to dedicated academic cluster and commercial computational environments. Outreach efforts target undergraduate students through recruiting workshops and attract them to interdisciplinary graduate programs. Curriculum development activities emphasize cross-layer parallelism. For further information, see project web site at http://sites.psu.edu/XPSGenomics
StatusFinished
Effective start/end date9/1/142/29/20

Funding

  • National Science Foundation: $849,984.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.