TY - JOUR
T1 - Enhancing big data in the social sciences with crowdsourcing
T2 - Data augmentation practices, techniques, and opportunities
AU - Porter, Nathaniel D.
AU - Verdery, Ashton M.
AU - Gaddis, S. Michael
N1 - Publisher Copyright:
© 2020 Porter et al.
PY - 2020/6
Y1 - 2020/6
N2 - Proponents of big data claim it will fuel a social research revolution, but skeptics challenge its reliability and decontextualization. The largest subset of big data is not designed for social research. Data augmentation-systematic assessment of measurement against known quantities and expansion of extant data with new information-is an important tool to maximize such data's validity and research value. Using trained research assistants or specialized algorithms are common approaches to augmentation but may not scale to big data or appease skeptics. We consider a third alternative: data augmentation with online crowdsourcing. Three empirical cases illustrate strengths and limitations of crowdsourcing, using Amazon Mechanical Turk to verify automated coding, link online databases, and gather data on online resources. Using these, we develop best practice guidelines and a reporting template to enhance reproducibility. Carefully designed, correctly applied, and rigorously documented crowdsourcing help address concerns about big data's usefulness for social research.
AB - Proponents of big data claim it will fuel a social research revolution, but skeptics challenge its reliability and decontextualization. The largest subset of big data is not designed for social research. Data augmentation-systematic assessment of measurement against known quantities and expansion of extant data with new information-is an important tool to maximize such data's validity and research value. Using trained research assistants or specialized algorithms are common approaches to augmentation but may not scale to big data or appease skeptics. We consider a third alternative: data augmentation with online crowdsourcing. Three empirical cases illustrate strengths and limitations of crowdsourcing, using Amazon Mechanical Turk to verify automated coding, link online databases, and gather data on online resources. Using these, we develop best practice guidelines and a reporting template to enhance reproducibility. Carefully designed, correctly applied, and rigorously documented crowdsourcing help address concerns about big data's usefulness for social research.
UR - http://www.scopus.com/inward/record.url?scp=85086356089&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086356089&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0233154
DO - 10.1371/journal.pone.0233154
M3 - Article
C2 - 32520948
AN - SCOPUS:85086356089
SN - 1932-6203
VL - 15
JO - PloS one
JF - PloS one
IS - 6
M1 - e0233154
ER -