TY - JOUR
T1 - Misclassification in Workers’ Telecommuting Frequency Choices Using a Generalized Extreme Value Model
AU - Balan, Lacramioara Elena
AU - Paleti, Rajesh
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025/5
Y1 - 2025/5
N2 - Telecommuting frequency is a response variable collected in travel surveys and is, therefore, prone to errors leading to mismeasurements or misclassification. Misclassification of explanatory variables is a common risk when using statistical modeling techniques. We define “misclassification” as a response reported or recorded in the wrong category; for example, a variable is recorded as a 1 when it should be 0. In this context, this study aims to develop a statistical model to analyze telecommuting data which accounts for potential misclassification errors by building on existing literature in econometrics. The empirical analysis was undertaken using the 2017 National Household Travel Survey (NHTS) and the general extreme value (GEV) models available in the literature. Specifically, the frequency of telecommuting days was analyzed using the negative binomial (NB) model recast as the multinomial logit (MNL) model. By nature—and consistent with other studies—NHTS data are prone to errors that can be classified as intentional or unintentional misinformation provided by the person being interviewed. Ignoring these errors while modeling telecommuting frequencies using standard discrete count models can result in biased parameter estimates. The misclassification parameter was calculated for both over-reporting and under-reporting scenarios. The misclassification errors can be as high as 14% over-reported and 10% under-reported, particularly for the neighboring values. Statistical fit comparison between the models shows that models that ignore misclassification have worse data fit and biased parameter estimates with significant policy implications.
AB - Telecommuting frequency is a response variable collected in travel surveys and is, therefore, prone to errors leading to mismeasurements or misclassification. Misclassification of explanatory variables is a common risk when using statistical modeling techniques. We define “misclassification” as a response reported or recorded in the wrong category; for example, a variable is recorded as a 1 when it should be 0. In this context, this study aims to develop a statistical model to analyze telecommuting data which accounts for potential misclassification errors by building on existing literature in econometrics. The empirical analysis was undertaken using the 2017 National Household Travel Survey (NHTS) and the general extreme value (GEV) models available in the literature. Specifically, the frequency of telecommuting days was analyzed using the negative binomial (NB) model recast as the multinomial logit (MNL) model. By nature—and consistent with other studies—NHTS data are prone to errors that can be classified as intentional or unintentional misinformation provided by the person being interviewed. Ignoring these errors while modeling telecommuting frequencies using standard discrete count models can result in biased parameter estimates. The misclassification parameter was calculated for both over-reporting and under-reporting scenarios. The misclassification errors can be as high as 14% over-reported and 10% under-reported, particularly for the neighboring values. Statistical fit comparison between the models shows that models that ignore misclassification have worse data fit and biased parameter estimates with significant policy implications.
UR - https://www.scopus.com/pages/publications/85216197101
UR - https://www.scopus.com/inward/citedby.url?scp=85216197101&partnerID=8YFLogxK
U2 - 10.1177/03611981241308867
DO - 10.1177/03611981241308867
M3 - Article
AN - SCOPUS:85216197101
SN - 0361-1981
VL - 2679
SP - 726
EP - 733
JO - Transportation Research Record
JF - Transportation Research Record
IS - 5
ER -