Project Details
Description
The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) powers the
next generation of computational genomic research. The AnVIL makes available several of the the most widely
used analysis environments for genomics and biomedical research including Bioconductor, RStudio, Galaxy,
Jupyter, Cromwell, and IGV in a secure, scalable, and accessible cloud-based environment. It currently houses
>600,000 genomic samples from the largest NHGRI projects including the Centers for Common Disease
Genomics (CCDG), the Centers for Mendelian Disease Genomics (CMG), the Telomere-to-Telomere (T2T)
consortium, and the Genotype Tissue Expression (GTEx) project. Our user centered solution for data access,
analysis, and visualization enables investigators across all levels of expertise to fully utilize genomic datasets
using environments they are already familiar with, leveraging well engineered and optimized scientific
computing infrastructure for greater efficiency and lower costs. In this second phase of the AnVIL, we will
expand the AnVIL experience with several additional high-value services and capabilities with the goal of
expanding the number of researchers using the platform and the depth of their research. In Aim 1, we will
enhance the core platform in several innovative ways. First to support researchers transitioning into the cloud
environment, we will work to simplify and optimize the research environment with new dashboards for
monitoring costs and managing teams, along with optimizations to the APIs to run in a multi-cloud
environment. Next we will optimize Galaxy in AnVIL to improve the user experience, enable cost-efficient
computing, develop a workflow recommender system, and enable interoperability by integrating computing
services across multiple clouds. Within Bioconductor, we will introduce new capabilities for reliable software
engineering practices, enhance accessibility through monographs, curriculum authoring, and shiny apps; and
optimize Bioconductor infrastructure and development for the cloud. Additionally, we will design and
implement standards to ensure AnVIL is interoperable with other cloud-based research systems. In Aim 2, we
will introduce four new scientific services to support critical analysis tasks. This includes services for enhanced
machine learning capabilities, data harmonization and metadata autocompletion, new liftover services to
translate genomic knowledge between reference genomes, and comprehensive variant discovery and analysis
using long read sequencing. In Aim 3, we will expand our efforts for training and outreach. This will begin
with focused high-impact events including community workshops and the AnVIL Champions Program, with
the goal of seeding and developing community-driven support. We will also create scalable accessible videos
and massive open online courses (MOOCs) leveraging new educational infrastructure we are developing. In
Aim 4, we will continue our joint leadership with our AnVIL partners at the Broad, as well as welcome our
new partners in the forthcoming AnVIL Clinical Resource (ACR) program.
Status | Finished |
---|---|
Effective start/end date | 9/21/18 → 4/30/24 |
Funding
- National Human Genome Research Institute: $2,000,000.00
- National Human Genome Research Institute: $3,088,142.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.