TY - JOUR
T1 - Ecological prediction at macroscales using big data
T2 - Does sampling design matter?
AU - Soranno, Patricia A.
AU - Cheruvelil, Kendra Spence
AU - Liu, Boyang
AU - Wang, Qi
AU - Tan, Pang Ning
AU - Zhou, Jiayu
AU - King, Katelyn B.S.
AU - McCullough, Ian M.
AU - Stachelek, Jemma
AU - Bartley, Meridith
AU - Filstrup, Christopher T.
AU - Hanks, Ephraim M.
AU - Lapierre, Jean François
AU - Lottig, Noah R.
AU - Schliep, Erin M.
AU - Wagner, Tyler
AU - Webster, Katherine E.
N1 - Funding Information:
Author contributions are as follows. P. A. Soranno and K. S. Cheruvelil are co-lead authors and contributed equally to the manuscript by leading the conceptualization and writing of the manuscript. After the co-leads, there are four groups of authors in decreasing level of contribution, with authors listed in alphabetical order within each group. (1) Q. Wang and B. Liu performed the analysis, with (2) P-N. Tan and J. Zhou as supervisors. (3) K. B. S. King, I. M. McCullough, and J. Stachelek performed database queries, summaries, created tables and figures, and the code repository. (4) The remaining authors, in addition to those in groups 1–3, contributed to the development, editing, and writing of the paper. The authors declare that they have no conflict of interest. Further, we wish to thank Autumn Poisson and all participants from the 2018 Continental Limnology Project Workshop at Pennsylvania State University, including Emily Stanley, Nicole Smith, Nathan Wikle, Sarah Collins, and Claire Boudreau. Thanks to Meredith and Justin Holgerson for their feedback and providing information about the FIA program, to Allie Shoffner for her editorial suggestions, and to the anonymous reviewers whose suggestions improved this manuscript. Funding was provided by the US NSF Macrosystems Biology Program grants, DEB-1638679; DEB-1638550, DEB-1638539, DEB-1638554. PAS was also supported by USDA National Institute of Food and Agriculture Hatch Project, Grant Number: 176820. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Funding Information:
Author contributions are as follows. P. A. Soranno and K. S. Cheruvelil are co‐lead authors and contributed equally to the manuscript by leading the conceptualization and writing of the manuscript. After the co‐leads, there are four groups of authors in decreasing level of contribution, with authors listed in alphabetical order within each group. (1) Q. Wang and B. Liu performed the analysis, with (2) P‐N. Tan and J. Zhou as supervisors. (3) K. B. S. King, I. M. McCullough, and J. Stachelek performed database queries, summaries, created tables and figures, and the code repository. (4) The remaining authors, in addition to those in groups 1–3, contributed to the development, editing, and writing of the paper. The authors declare that they have no conflict of interest. Further, we wish to thank Autumn Poisson and all participants from the 2018 Continental Limnology Project Workshop at Pennsylvania State University, including Emily Stanley, Nicole Smith, Nathan Wikle, Sarah Collins, and Claire Boudreau. Thanks to Meredith and Justin Holgerson for their feedback and providing information about the FIA program, to Allie Shoffner for her editorial suggestions, and to the anonymous reviewers whose suggestions improved this manuscript. Funding was provided by the US NSF Macrosystems Biology Program grants, DEB‐1638679; DEB‐1638550, DEB‐1638539, DEB‐1638554. PAS was also supported by USDA National Institute of Food and Agriculture Hatch Project, Grant Number: 176820. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Publisher Copyright:
© 2020 by the Ecological Society of America
PY - 2020/9/1
Y1 - 2020/9/1
N2 - Although ecosystems respond to global change at regional to continental scales (i.e., macroscales), model predictions of ecosystem responses often rely on data from targeted monitoring of a small proportion of sampled ecosystems within a particular geographic area. In this study, we examined how the sampling strategy used to collect data for such models influences predictive performance. We subsampled a large and spatially extensive data set to investigate how macroscale sampling strategy affects prediction of ecosystem characteristics in 6,784 lakes across a 1.8-million-km2 area. We estimated model predictive performance for different subsets of the data set to mimic three common sampling strategies for collecting observations of ecosystem characteristics: random sampling design, stratified random sampling design, and targeted sampling. We found that sampling strategy influenced model predictive performance such that (1) stratified random sampling designs did not improve predictive performance compared to simple random sampling designs and (2) although one of the scenarios that mimicked targeted (non-random) sampling had the poorest performing predictive models, the other targeted sampling scenarios resulted in models with similar predictive performance to that of the random sampling scenarios. Our results suggest that although potential biases in data sets from some forms of targeted sampling may limit predictive performance, compiling existing spatially extensive data sets can result in models with good predictive performance that may inform a wide range of science questions and policy goals related to global change.
AB - Although ecosystems respond to global change at regional to continental scales (i.e., macroscales), model predictions of ecosystem responses often rely on data from targeted monitoring of a small proportion of sampled ecosystems within a particular geographic area. In this study, we examined how the sampling strategy used to collect data for such models influences predictive performance. We subsampled a large and spatially extensive data set to investigate how macroscale sampling strategy affects prediction of ecosystem characteristics in 6,784 lakes across a 1.8-million-km2 area. We estimated model predictive performance for different subsets of the data set to mimic three common sampling strategies for collecting observations of ecosystem characteristics: random sampling design, stratified random sampling design, and targeted sampling. We found that sampling strategy influenced model predictive performance such that (1) stratified random sampling designs did not improve predictive performance compared to simple random sampling designs and (2) although one of the scenarios that mimicked targeted (non-random) sampling had the poorest performing predictive models, the other targeted sampling scenarios resulted in models with similar predictive performance to that of the random sampling scenarios. Our results suggest that although potential biases in data sets from some forms of targeted sampling may limit predictive performance, compiling existing spatially extensive data sets can result in models with good predictive performance that may inform a wide range of science questions and policy goals related to global change.
UR - http://www.scopus.com/inward/record.url?scp=85084140561&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084140561&partnerID=8YFLogxK
U2 - 10.1002/eap.2123
DO - 10.1002/eap.2123
M3 - Article
C2 - 32160362
AN - SCOPUS:85084140561
SN - 1051-0761
VL - 30
JO - Ecological Applications
JF - Ecological Applications
IS - 6
M1 - e02123
ER -