TY - GEN
T1 - 5 sources of clickbaits you should know! Using synthetic clickbaits to improve prediction and distinguish between bot-generated and human-written headlines
AU - Le, Thai
AU - Shu, Kai
AU - Molina, Maria D.
AU - Lee, Dongwon
AU - Shyam Sundar, S.
AU - Liu, Huan
PY - 2019/8/27
Y1 - 2019/8/27
N2 - Clickbait is an attractive yet misleading headline that lures readers to commit click-conversion. Development of robust clickbait detection models has been, however, hampered due to the shortage of high-quality labeled training samples. To overcome this challenge, we investigate how to exploit human-written and machine-generated synthetic clickbaits. We first ask crowdworkers and journalism students to generate clickbaity news headlines. Second, we utilize deep generative models to generate clickbaity headlines. Through empirical evaluations, we demonstrate that synthetic clickbaits by human entities and deep generative models are consistently useful in improving the accuracy of various prediction models, by as much as 14.5% in AUC, across two real datasets and different types of algorithms. Especially, we observe an improvement in accuracy, up to 8.5% in AUC, even for top-ranked clickbait detectors from Clickbait Challenge 2017. Our study proposes a novel direction to address the shortage of labeled training data, one of fundamental bottlenecks in supervised learning, by means of synthetic training data with reinforced domain knowledge. It also provides a solution for distinguishing between bot-generated and human-written clickbaits, thus aiding the work of moderators and better alerting news consumers.
AB - Clickbait is an attractive yet misleading headline that lures readers to commit click-conversion. Development of robust clickbait detection models has been, however, hampered due to the shortage of high-quality labeled training samples. To overcome this challenge, we investigate how to exploit human-written and machine-generated synthetic clickbaits. We first ask crowdworkers and journalism students to generate clickbaity news headlines. Second, we utilize deep generative models to generate clickbaity headlines. Through empirical evaluations, we demonstrate that synthetic clickbaits by human entities and deep generative models are consistently useful in improving the accuracy of various prediction models, by as much as 14.5% in AUC, across two real datasets and different types of algorithms. Especially, we observe an improvement in accuracy, up to 8.5% in AUC, even for top-ranked clickbait detectors from Clickbait Challenge 2017. Our study proposes a novel direction to address the shortage of labeled training data, one of fundamental bottlenecks in supervised learning, by means of synthetic training data with reinforced domain knowledge. It also provides a solution for distinguishing between bot-generated and human-written clickbaits, thus aiding the work of moderators and better alerting news consumers.
UR - http://www.scopus.com/inward/record.url?scp=85075451341&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075451341&partnerID=8YFLogxK
U2 - 10.1145/3341161.3342875
DO - 10.1145/3341161.3342875
M3 - Conference contribution
T3 - Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019
SP - 33
EP - 40
BT - Proceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019
A2 - Spezzano, Francesca
A2 - Chen, Wei
A2 - Xiao, Xiaokui
PB - Association for Computing Machinery, Inc
T2 - 11th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2019
Y2 - 27 August 2019 through 30 August 2019
ER -