Project Details
Description
The proposed research harnesses parallelism to accelerate the
pervasive bioinformatics workflow of detecting genetic variations.
This workflow determines the genetic variants present in an
individual, given DNA sequencing data. The variant detection workflow
is an integral part of current genomic data analysis, and several
studies have linked genetic variants to diseases. Typical instances
of this workflow currently take several hours to multiple days to
complete with state-of-the-art software, and current algorithms and
software are unable to exploit and benefit from even modest levels of
hardware parallelism. Most prior approaches to parallelization and
performance tuning of genomic data analysis pipelines have targeted
computation, I/O, or network data transfer bottlenecks in isolation,
and consequently, are limited in the overall performance improvement
they can achieve. This project targets end-to-end acceleration
methodologies and uses emerging heterogeneous supercomputers to
reduce workflow time-to-completion.
The project focuses on holistic methodologies to accelerate multiple
components within the genetic variant detection workflow. It explores
lightweight data reorganizations at multiple granularities to enhance
locality, investigates compute-, communication-, and I/O task
cotuning, locality-aware load-balancing, and coordinated resource
partitioning to exploit high-performance computing platforms. A key
goal of the proposed research is to design domain-specific
optimizations targeting the massive parallelism and scalability
potential of current heterogeneous supercomputers, so that the
developed techniques can be easily transferred and applied to
dedicated academic cluster and commercial computational environments.
Outreach efforts target undergraduate students through recruiting
workshops and attract them to interdisciplinary graduate programs.
Curriculum development activities emphasize cross-layer parallelism.
For further information, see project web site at
http://sites.psu.edu/XPSGenomics
Status | Finished |
---|---|
Effective start/end date | 9/1/14 → 2/29/20 |
Funding
- National Science Foundation: $849,984.00