TY - JOUR
T1 - Practical Computational Reproducibility in the Life Sciences
AU - Grüning, Björn
AU - Chilton, John
AU - Köster, Johannes
AU - Dale, Ryan
AU - Soranzo, Nicola
AU - van den Beek, Marius
AU - Goecks, Jeremy
AU - Backofen, Rolf
AU - Nekrutenko, Anton
AU - Taylor, James
N1 - Funding Information:
The authors are grateful to the Bioconda, BioContainers, and Galaxy communities, as without these resources, this work would not be possible. Nate Coraor provided critical advice on the project and edited the manuscript. This project was supported in part by NIH grants U41 HG006620 and R01 AI134384-01 , as well as NSF grant 1661497 to J.T., A.N., and J.G. R.D. was supported by the Intramural Program of the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health. N.S. was supported by the BBSRC Core Capability Grant BB/CCG1720/1 to the Earlham Institute. Additional funding was provided by German Federal Ministry of Education and Research ( BMBF grants 031A538A & 031L0101C de.NBI-RBC & de.NBI-epi) to R.B. and B.G.
Funding Information:
The authors are grateful to the Bioconda, BioContainers, and Galaxy communities, as without these resources, this work would not be possible. Nate Coraor provided critical advice on the project and edited the manuscript. This project was supported in part by NIH grants U41 HG006620 and R01 AI134384-01, as well as NSF grant 1661497 to J.T., A.N., and J.G. R.D. was supported by the Intramural Program of the National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health. N.S. was supported by the BBSRC Core Capability Grant BB/CCG1720/1 to the Earlham Institute. Additional funding was provided by German Federal Ministry of Education and Research (BMBF grants 031A538A & 031L0101C de.NBI-RBC & de.NBI-epi) to R.B. and B.G.
Publisher Copyright:
© 2018 Elsevier Inc.
PY - 2018/6/27
Y1 - 2018/6/27
N2 - Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components—a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines—to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly. Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components—a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines—to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.
AB - Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components—a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines—to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly. Many areas of research suffer from poor reproducibility, particularly in computationally intensive domains where results rely on a series of complex methodological decisions that are not well captured by traditional publication approaches. Various guidelines have emerged for achieving reproducibility, but implementation of these practices remains difficult due to the challenge of assembling software tools plus associated libraries, connecting tools together into pipelines, and specifying parameters. Here, we discuss a suite of cutting-edge technologies that make computational reproducibility not just possible, but practical in both time and effort. This suite combines three well-tested components—a system for building highly portable packages of bioinformatics software, containerization and virtualization technologies for isolating reusable execution environments for these packages, and workflow systems that automatically orchestrate the composition of these packages for entire pipelines—to achieve an unprecedented level of computational reproducibility. We also provide a practical implementation and five recommendations to help set a typical researcher on the path to performing data analyses reproducibly.
UR - http://www.scopus.com/inward/record.url?scp=85048421153&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048421153&partnerID=8YFLogxK
U2 - 10.1016/j.cels.2018.03.014
DO - 10.1016/j.cels.2018.03.014
M3 - Comment/debate
C2 - 29953862
AN - SCOPUS:85048421153
SN - 2405-4712
VL - 6
SP - 631
EP - 635
JO - Cell Systems
JF - Cell Systems
IS - 6
ER -