MULTIVARIATE REGRESSION WITH RESPONDENT-DRIVEN SAMPLING DATA

Project: Research project

Project Details

Description

AbstractMany subpopulations of special interest to public health, such as sex workers, are hard to survey because theyare rare and would require a large number of screening interviews to generate a sufficient sample size orbecause they are stigmatized and unlikely to trust researchers with personal information. Respondent-drivensampling (RDS) is one of the most effective means of sampling such subpopulations, because it asks andincentivizes subpopulation members to recruit other members through their personal social networks and thenweights the resultant sample to correct for biases induced by the sampling design and make inferences aboutunivariate statistics that are, under certain conditions, generalizable to the subpopulation of interest. Hundredsof studies have been conducted using RDS, backed by over $166 million of federal funding. The basicmethodology of RDS has been subjected to several methodological extensions, evaluations, and criticisms, butprior statistical developments have largely focused on improving estimators for univariate statistics (e.g.,prevalence of a risk factor). We propose to extend prior methodological work on statistical estimation in RDS todevelop accurate and efficient tools that will allow researchers to estimate the parameters of multivariateregression models which will enhance understandings of hard to survey subpopulations. The current practiceof multivariate RDS estimation is ad hoc with researchers applying over 10 distinct approaches throughout theliterature but offering little or no justification for the approach they chose. RDS methodologists have yet toestablish best practices or evaluate the performance of these different approaches. We propose to perform thisevaluation. By doing so, this project will enable future RDS studies to address multivariate research questionsabout hard to survey subpopulations, and it will add substantial value to the hundreds of RDS studies that havepreviously been funded and collected. The proposed project has two components that will provide guidance toresearchers (and the public health community) about conducting multivariate analyses with RDS data and thetools to conduct these analyses. The first component consists of a series of simulation studies that evaluatethe performance of the most popular multivariate RDS estimators. The simulation studies will be designed toexplore the performance of the estimators across a range of theoretically ideal and more realistic RDSsampling scenarios as well as a diversity of network types. The second component involves the developmentand dissemination of software in two commonly used statistical packages (R and Stata) that implements thebest performing multivariate estimators identified in the simulation studies. The data collected in RDS studieshas vast untapped potential to contribute to understandings of specific risk factors in hard to surveypopulations and the multivariate tools we will develop as part of this proposal will help to unlock this potential.

StatusFinished
Effective start/end date7/1/166/30/18

Funding

  • National Center for Health Statistics: $78,600.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.