TY - JOUR
T1 - Crowdsourcing Utilizing Subgroup Structure of Latent Factor Modeling
AU - Xu, Qi
AU - Yuan, Yubai
AU - Wang, Junhui
AU - Qu, Annie
N1 - Publisher Copyright:
© 2023 American Statistical Association.
PY - 2024
Y1 - 2024
N2 - Crowdsourcing has emerged as an alternative solution for collecting large scale labels. However, the majority of recruited workers are not domain experts, so their contributed labels could be noisy. In this article, we propose a two-stage model to predict the true labels for multicategory classification tasks in crowdsourcing. In the first stage, we fit the observed labels with a latent factor model and incorporate subgroup structures for both tasks and workers through a multi-centroid grouping penalty. Group-specific rotations are introduced to align workers with different task categories to solve multicategory crowdsourcing tasks. In the second stage, we propose a concordance-based approach to identify high-quality worker subgroups who are relied upon to assign labels to tasks. In theory, we show the estimation consistency of the latent factors and the prediction consistency of the proposed method. The simulation studies show that the proposed method outperforms the existing competitive methods, assuming the subgroup structures within tasks and workers. We also demonstrate the application of the proposed method to real world problems and show its superiority. Supplementary materials for this article are available online.
AB - Crowdsourcing has emerged as an alternative solution for collecting large scale labels. However, the majority of recruited workers are not domain experts, so their contributed labels could be noisy. In this article, we propose a two-stage model to predict the true labels for multicategory classification tasks in crowdsourcing. In the first stage, we fit the observed labels with a latent factor model and incorporate subgroup structures for both tasks and workers through a multi-centroid grouping penalty. Group-specific rotations are introduced to align workers with different task categories to solve multicategory crowdsourcing tasks. In the second stage, we propose a concordance-based approach to identify high-quality worker subgroups who are relied upon to assign labels to tasks. In theory, we show the estimation consistency of the latent factors and the prediction consistency of the proposed method. The simulation studies show that the proposed method outperforms the existing competitive methods, assuming the subgroup structures within tasks and workers. We also demonstrate the application of the proposed method to real world problems and show its superiority. Supplementary materials for this article are available online.
UR - http://www.scopus.com/inward/record.url?scp=85150756543&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85150756543&partnerID=8YFLogxK
U2 - 10.1080/01621459.2023.2178925
DO - 10.1080/01621459.2023.2178925
M3 - Article
AN - SCOPUS:85150756543
SN - 0162-1459
VL - 119
SP - 1192
EP - 1204
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 546
ER -