Expanding the AnVIL (Analysis, Visualization, and Informatics Lab-space)

  • Schatz, Michael M (PI)
  • Nekrutenko, Anton (CoPI)
  • Hoffman, Ava A.M (CoPI)
  • Taylor, Casey C.O (CoPI)
  • Afgan, Enis (CoPI)
  • Tan, Frederick J. (CoPI)
  • Leek, Jeffrey T. (CoPI)
  • Goecks, Jeremy J. (CoPI)
  • Waldron, Levi (CoPI)
  • Morgan, Martin T. (CoPI)
  • Taylor, James Peter (CoPI)
  • Carey, Vincent J. (CoPI)

Project: Research project

Project Details


The NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) powers the next generation of computational genomic research. The AnVIL makes available several of the the most widely used analysis environments for genomics and biomedical research including Bioconductor, RStudio, Galaxy, Jupyter, Cromwell, and IGV in a secure, scalable, and accessible cloud-based environment. It currently houses >600,000 genomic samples from the largest NHGRI projects including the Centers for Common Disease Genomics (CCDG), the Centers for Mendelian Disease Genomics (CMG), the Telomere-to-Telomere (T2T) consortium, and the Genotype Tissue Expression (GTEx) project. Our user centered solution for data access, analysis, and visualization enables investigators across all levels of expertise to fully utilize genomic datasets using environments they are already familiar with, leveraging well engineered and optimized scientific computing infrastructure for greater efficiency and lower costs. In this second phase of the AnVIL, we will expand the AnVIL experience with several additional high-value services and capabilities with the goal of expanding the number of researchers using the platform and the depth of their research. In Aim 1, we will enhance the core platform in several innovative ways. First to support researchers transitioning into the cloud environment, we will work to simplify and optimize the research environment with new dashboards for monitoring costs and managing teams, along with optimizations to the APIs to run in a multi-cloud environment. Next we will optimize Galaxy in AnVIL to improve the user experience, enable cost-efficient computing, develop a workflow recommender system, and enable interoperability by integrating computing services across multiple clouds. Within Bioconductor, we will introduce new capabilities for reliable software engineering practices, enhance accessibility through monographs, curriculum authoring, and shiny apps; and optimize Bioconductor infrastructure and development for the cloud. Additionally, we will design and implement standards to ensure AnVIL is interoperable with other cloud-based research systems. In Aim 2, we will introduce four new scientific services to support critical analysis tasks. This includes services for enhanced machine learning capabilities, data harmonization and metadata autocompletion, new liftover services to translate genomic knowledge between reference genomes, and comprehensive variant discovery and analysis using long read sequencing. In Aim 3, we will expand our efforts for training and outreach. This will begin with focused high-impact events including community workshops and the AnVIL Champions Program, with the goal of seeding and developing community-driven support. We will also create scalable accessible videos and massive open online courses (MOOCs) leveraging new educational infrastructure we are developing. In Aim 4, we will continue our joint leadership with our AnVIL partners at the Broad, as well as welcome our new partners in the forthcoming AnVIL Clinical Resource (ACR) program.
Effective start/end date9/21/184/30/24


  • National Human Genome Research Institute: $2,000,000.00
  • National Human Genome Research Institute: $3,088,142.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.