TY - GEN
T1 - Empowering Agroecosystem Modeling with HTC Scientific Workflows
T2 - 2019 IEEE International Conference on Big Data, Big Data 2019
AU - Silva, Rafael Ferreira Da
AU - Mayani, Rajiv
AU - Shi, Yuning
AU - Kemanian, Armen R.
AU - Rynge, Mats
AU - Deelman, Ewa
N1 - Funding Information:
This work was funded by the Defense Advanced Research Projects Agency under award #W911NF-18-1-0027; and partly funded by the National Science Foundation under
Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Scientific workflows have enabled large-scale scientific computations and data analysis, and lowered the entry barrier for performing computations in distributed heterogeneous platforms (e.g., HTC and HPC). In spite of impressive achievements to date, large-scale modeling, simulation, and data analytics in the long-tail still face several challenges such as efficient scheduling and execution of large-scale workflows (\mathrm{O}(10^{6})) with very short-running tasks (few seconds). While the current trend to support next-generation workflows on leadership class machines have gained much attention in the past years, at the other end of the spectrum scientific workflows from the long-tail science have become larger and require processing massive volumes of data. In this paper, we report on our experience in designing and implementing an HTC workflow for agroecosystem modeling. We leverage well-known (task clustering and co-scheduling) and emerging (hierarchical workflows and containers) workflow optimization techniques to make the workflow planning problem tractable, and maximize resource utilization and the degree of task parallelism. Experimental results, via the implementation of a use case, show that by strategically combining the above strategies and defining an appropriate set of optimization parameters, the overall workflow makespan can be improved by 3.5 orders of magnitude when compared to a regular (non-optimized) execution of the workflow.
AB - Scientific workflows have enabled large-scale scientific computations and data analysis, and lowered the entry barrier for performing computations in distributed heterogeneous platforms (e.g., HTC and HPC). In spite of impressive achievements to date, large-scale modeling, simulation, and data analytics in the long-tail still face several challenges such as efficient scheduling and execution of large-scale workflows (\mathrm{O}(10^{6})) with very short-running tasks (few seconds). While the current trend to support next-generation workflows on leadership class machines have gained much attention in the past years, at the other end of the spectrum scientific workflows from the long-tail science have become larger and require processing massive volumes of data. In this paper, we report on our experience in designing and implementing an HTC workflow for agroecosystem modeling. We leverage well-known (task clustering and co-scheduling) and emerging (hierarchical workflows and containers) workflow optimization techniques to make the workflow planning problem tractable, and maximize resource utilization and the degree of task parallelism. Experimental results, via the implementation of a use case, show that by strategically combining the above strategies and defining an appropriate set of optimization parameters, the overall workflow makespan can be improved by 3.5 orders of magnitude when compared to a regular (non-optimized) execution of the workflow.
UR - http://www.scopus.com/inward/record.url?scp=85081285302&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081285302&partnerID=8YFLogxK
U2 - 10.1109/BigData47090.2019.9006107
DO - 10.1109/BigData47090.2019.9006107
M3 - Conference contribution
AN - SCOPUS:85081285302
T3 - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
SP - 4545
EP - 4552
BT - Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
A2 - Baru, Chaitanya
A2 - Huan, Jun
A2 - Khan, Latifur
A2 - Hu, Xiaohua Tony
A2 - Ak, Ronay
A2 - Tian, Yuanyuan
A2 - Barga, Roger
A2 - Zaniolo, Carlo
A2 - Lee, Kisung
A2 - Ye, Yanfang Fanny
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 9 December 2019 through 12 December 2019
ER -