TY - JOUR
T1 - An evaluation of geo-located Twitter data for measuring human migration
AU - Yin, Junjun
AU - Gao, Yizhao
AU - Chi, Guangqing
N1 - Funding Information:
This research was supported in part by the National Science Foundation [Awards #1541136, #1823633, and #1927827]; the Eunice Kennedy Shriver National Institute of Child Health and Human Development [Award #P2C HD041025]; the USDA National Institute of Food and Agriculture and Multistate Research Project #PEN04623 [Accession #1013257]; and the Social Science Research Institute, Population Research Institute, and Institute for Computational and Data Sciences of the Pennsylvania State University. The authors thank the editor and the anonymous reviewers for their constructive comments on earlier versions of the article. Appreciation is extended to Dr. Jennifer Van Hook at The Pennsylvania State University for her many helpful suggestions.
Publisher Copyright:
© 2022 Informa UK Limited, trading as Taylor & Francis Group.
PY - 2022
Y1 - 2022
N2 - This study evaluates the spatial patterns of flows generated from geo-located Twitter data to measure human migration. Using geo-located tweets continuously collected in the U.S. from 2013 to 2015, we identified Twitter users who migrated per changes in county-of-residence every two years and compared the Twitter-estimated county-to-county migration flows with the ones from the U.S. Internal Revenue Service (IRS). To evaluate the spatial patterns of Twitter migration flows when representing the IRS counterparts, we developed a normalized difference representation index to visualize and identify those counties of over-/under-representations in the Twitter estimates. Further, we applied a multidimensional spatial scan statistic approach based on a Poisson process model to detect pairs of origin and destination regions where the over-/under-representativeness occurred. The results suggest that Twitter migration flows tend to under-represent the IRS estimates in regions with a large population and over-represent them in metropolitan regions adjacent to tourist attractions. This study demonstrated that geo-located Twitter data could be a sound statistical proxy for measuring human migration. Given that the spatial patterns of Twitter-estimated migration flows vary significantly across the geographic space, related studies will benefit from our approach by identifying those regions where data calibration is necessary.
AB - This study evaluates the spatial patterns of flows generated from geo-located Twitter data to measure human migration. Using geo-located tweets continuously collected in the U.S. from 2013 to 2015, we identified Twitter users who migrated per changes in county-of-residence every two years and compared the Twitter-estimated county-to-county migration flows with the ones from the U.S. Internal Revenue Service (IRS). To evaluate the spatial patterns of Twitter migration flows when representing the IRS counterparts, we developed a normalized difference representation index to visualize and identify those counties of over-/under-representations in the Twitter estimates. Further, we applied a multidimensional spatial scan statistic approach based on a Poisson process model to detect pairs of origin and destination regions where the over-/under-representativeness occurred. The results suggest that Twitter migration flows tend to under-represent the IRS estimates in regions with a large population and over-represent them in metropolitan regions adjacent to tourist attractions. This study demonstrated that geo-located Twitter data could be a sound statistical proxy for measuring human migration. Given that the spatial patterns of Twitter-estimated migration flows vary significantly across the geographic space, related studies will benefit from our approach by identifying those regions where data calibration is necessary.
UR - http://www.scopus.com/inward/record.url?scp=85131948215&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131948215&partnerID=8YFLogxK
U2 - 10.1080/13658816.2022.2075878
DO - 10.1080/13658816.2022.2075878
M3 - Article
C2 - 36643847
AN - SCOPUS:85131948215
SN - 1365-8816
VL - 36
SP - 1830
EP - 1852
JO - International Journal of Geographical Information Science
JF - International Journal of Geographical Information Science
IS - 9
ER -