TY - JOUR
T1 - Automating data-model workflows at a level 12 HUC scale
T2 - Watershed modeling in a distributed computing environment
AU - Leonard, Lorne
AU - Duffy, Christopher J.
N1 - Funding Information:
This research was supported in part by the National Science Foundation through XSEDE resources provided by the XSEDE Science Gateways program ( TG-EAR120019 ), NSF EarthCube ( GEO-44417482 ), NSF INSPIRE ( IIS-1344272 ), EPA ( 96305901 ), NOAA ( NA10OAR4310166 ). The authors would like to acknowledge the support from the Institute for CyberScience Director Padma Raghavan and Penn State Institutes for Energy and the Environment Director Tom Richard at The Pennsylvania State University.
Copyright:
Copyright 2014 Elsevier B.V., All rights reserved.
PY - 2014/11
Y1 - 2014/11
N2 - The prototype discussed in this article retrieves Essential Terrestrial Variable (ETV) web services and uses data-model workflows to transform ETV data for hydrological models in a distributed computing environment. The ETV workflow is a service layer to 100's of terabytes of national datasets bundled for fast data access in support of watershed modeling using the United States Geological Survey (USGS) Hydrological Unit Code (HUC) level-12 scale. The ETV data has been proposed as the Essential Terrestrial Data necessary to construct watershed models anywhere in the continental USA (Leonard and Duffy, 2013). Here, we present the hardware and software system designs to support the ETV, data-model, and model workflows using High Performance Computing (HPC) and service-oriented architecture.This infrastructure design is an important contribution to both how and where the workflows operate. We describe details of how these workflow services operate in a distributed manner for modeling CONUS HUC-12 catchments using the Penn State Integrated Hydrological Model (PIHM) as an example. The prototype is evaluated by generating data-model workflows for every CONUS HUC-12 and creating a repository of workflow provenance for every HUC-12 (~100km2) for use by researchers as a strategy to begin a new hydrological model study. The concept of provenance for data-model workflows developed here assures reproducibility of model simulations (e.g. reanalysis) from ETV datasets without storing model results which we have shown will require many petabytes of storage.
AB - The prototype discussed in this article retrieves Essential Terrestrial Variable (ETV) web services and uses data-model workflows to transform ETV data for hydrological models in a distributed computing environment. The ETV workflow is a service layer to 100's of terabytes of national datasets bundled for fast data access in support of watershed modeling using the United States Geological Survey (USGS) Hydrological Unit Code (HUC) level-12 scale. The ETV data has been proposed as the Essential Terrestrial Data necessary to construct watershed models anywhere in the continental USA (Leonard and Duffy, 2013). Here, we present the hardware and software system designs to support the ETV, data-model, and model workflows using High Performance Computing (HPC) and service-oriented architecture.This infrastructure design is an important contribution to both how and where the workflows operate. We describe details of how these workflow services operate in a distributed manner for modeling CONUS HUC-12 catchments using the Penn State Integrated Hydrological Model (PIHM) as an example. The prototype is evaluated by generating data-model workflows for every CONUS HUC-12 and creating a repository of workflow provenance for every HUC-12 (~100km2) for use by researchers as a strategy to begin a new hydrological model study. The concept of provenance for data-model workflows developed here assures reproducibility of model simulations (e.g. reanalysis) from ETV datasets without storing model results which we have shown will require many petabytes of storage.
UR - http://www.scopus.com/inward/record.url?scp=84907308837&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907308837&partnerID=8YFLogxK
U2 - 10.1016/j.envsoft.2014.07.015
DO - 10.1016/j.envsoft.2014.07.015
M3 - Article
AN - SCOPUS:84907308837
SN - 1364-8152
VL - 61
SP - 174
EP - 190
JO - Environmental Modelling and Software
JF - Environmental Modelling and Software
ER -