TY - GEN
T1 - Probabilistic Models for Fine-Grained Truth Discovery from Crowdsourced Data
AU - Ma, Fenglong
AU - Gao, Jing
N1 - Funding Information:
This work was sponsored by NSF IIS-1319973. We thank all collaborators: Yaliang Li, Qi Li, Minghui Qiu, Shi Zhi, Lu Su, Bo Zhao, Heng Ji and Jiawei Han.
PY - 2016/1/29
Y1 - 2016/1/29
N2 - In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose three fine-grained truth discovery models-parametric probabilistic model (FaitCrowd), non-parametric probabilistic model and topical influence-aware model-for the task of aggregating conflicting data collected from multiple users/sources. These probabilistic models jointly model the process of generating question content and sources' provided answers to estimate both fine-grained expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, theses models demonstrate better ability to obtain true answers for the questions compared with existing approaches.
AB - In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we propose three fine-grained truth discovery models-parametric probabilistic model (FaitCrowd), non-parametric probabilistic model and topical influence-aware model-for the task of aggregating conflicting data collected from multiple users/sources. These probabilistic models jointly model the process of generating question content and sources' provided answers to estimate both fine-grained expertise and true answers simultaneously. This leads to a more precise estimation of source reliability. Therefore, theses models demonstrate better ability to obtain true answers for the questions compared with existing approaches.
UR - http://www.scopus.com/inward/record.url?scp=84964734734&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84964734734&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2015.109
DO - 10.1109/ICDMW.2015.109
M3 - Conference contribution
AN - SCOPUS:84964734734
T3 - Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
SP - 1556
EP - 1557
BT - Proceedings - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
A2 - Wu, Xindong
A2 - Tuzhilin, Alexander
A2 - Xiong, Hui
A2 - Dy, Jennifer G.
A2 - Aggarwal, Charu
A2 - Zhou, Zhi-Hua
A2 - Cui, Peng
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 15th IEEE International Conference on Data Mining Workshop, ICDMW 2015
Y2 - 14 November 2015 through 17 November 2015
ER -