TY - JOUR
T1 - Network structure and biased variance estimation in respondent driven sampling
AU - Verdery, Ashton M.
AU - Mouw, Ted
AU - Bauldry, Shawn
AU - Mucha, Peter J.
N1 - Funding Information:
We thank Mason Porter for providing access to the Facebook data set we use. We also thank participants at the 2012 Hidden and Hard to Reach Populations Conference, Jonathan Daw, and Charles Seguin for helpful comments on earlier drafts. We are grateful to the Carolina Population Center for training (T32 HD007168) and general support (R24 HD050924) and to the Penn State Population Research Institute (R24HD041025). Peter J. Mucha was supported by the National Science Foundation (DMS-0645369) and by Award Number R21GM099493 from the National Institute of General Medical Sciences. This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. Special acknowledgment is due Ronald R. Rindfuss and Barbara Entwisle for assistance in the original design. Information on how to obtain the Add Health data files is available on the Add Health website (http://www.cpc.unc.edu/addhealth). No direct support was received from grant P01-HD31921 for this analysis. The content is solely the responsibility of the authors and does not necessarily represent the official views of any aforementioned funding agencies. The authors are grateful to the Carolina Population Center for training (T32 HD007168) and general support (R24 HD050924) and to the Penn State Population Research Institute (R24HD041025). Peter J. Mucha was supported by the National Science Foundation (DMS-0645369) and by Award Number R21GM099493 from the National Institute of General Medical Sciences. This research uses data from Add Health, a program project directed by Kathleen Mullan Harris and designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill, and funded by grant P01-HD31921 from the Eunice Kennedy Shriver National Institute of Child Health and Human Development, with cooperative funding from 23 other federal agencies and foundations. No direct support was received from grant P01-HD31921 for this analysis. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Publisher Copyright:
© 2015 Verdery et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2015/12/1
Y1 - 2015/12/1
N2 - This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.
AB - This paper explores bias in the estimation of sampling variance in Respondent Driven Sampling (RDS). Prior methodological work on RDS has focused on its problematic assumptions and the biases and inefficiencies of its estimators of the population mean. Nonetheless, researchers have given only slight attention to the topic of estimating sampling variance in RDS, despite the importance of variance estimation for the construction of confidence intervals and hypothesis tests. In this paper, we show that the estimators of RDS sampling variance rely on a critical assumption that the network is First Order Markov (FOM) with respect to the dependent variable of interest. We demonstrate, through intuitive examples, mathematical generalizations, and computational experiments that current RDS variance estimators will always underestimate the population sampling variance of RDS in empirical networks that do not conform to the FOM assumption. Analysis of 215 observed university and school networks from Facebook and Add Health indicates that the FOM assumption is violated in every empirical network we analyze, and that these violations lead to substantially biased RDS estimators of sampling variance. We propose and test two alternative variance estimators that show some promise for reducing biases, but which also illustrate the limits of estimating sampling variance with only partial information on the underlying population social network.
UR - http://www.scopus.com/inward/record.url?scp=84957596687&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84957596687&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0145296
DO - 10.1371/journal.pone.0145296
M3 - Article
C2 - 26679927
AN - SCOPUS:84957596687
SN - 1932-6203
VL - 10
JO - PloS one
JF - PloS one
IS - 12
M1 - e0145296
ER -