TY - GEN
T1 - Characterization and Early Detection of Evergreen News Articles
AU - Liao, Yiming
AU - Wang, Shuguang
AU - Han, Eui Hong
AU - Lee, Jongwuk
AU - Lee, Dongwon
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - Although the majority of news articles are only viewed for days or weeks, there are a small fraction of news articles that are read across years, thus named as evergreen news articles. Because evergreen articles maintain a timeless quality and are consistently of interests to the public, understanding their characteristics better has huge implications for news outlets and platforms yet there are few studies that have explicitly investigated on evergreen articles. Addressing this gap, in this paper, we first propose a flexible parameterized definition of evergreen articles to capture their long-term high traffic patterns. Using a real dataset from the Washington Post, then, we unearth several distinctive characteristics of evergreen articles and build an early prediction model with encouraging results. Although less than 1% of news articles were identified as evergreen, our model achieves 0.961 in ROC AUC and 0.172 in PR AUC in 10-fold cross validation.
AB - Although the majority of news articles are only viewed for days or weeks, there are a small fraction of news articles that are read across years, thus named as evergreen news articles. Because evergreen articles maintain a timeless quality and are consistently of interests to the public, understanding their characteristics better has huge implications for news outlets and platforms yet there are few studies that have explicitly investigated on evergreen articles. Addressing this gap, in this paper, we first propose a flexible parameterized definition of evergreen articles to capture their long-term high traffic patterns. Using a real dataset from the Washington Post, then, we unearth several distinctive characteristics of evergreen articles and build an early prediction model with encouraging results. Although less than 1% of news articles were identified as evergreen, our model achieves 0.961 in ROC AUC and 0.172 in PR AUC in 10-fold cross validation.
UR - http://www.scopus.com/inward/record.url?scp=85084795907&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084795907&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-46133-1_33
DO - 10.1007/978-3-030-46133-1_33
M3 - Conference contribution
AN - SCOPUS:85084795907
SN - 9783030461324
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 552
EP - 568
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings
A2 - Brefeld, Ulf
A2 - Fromont, Elisa
A2 - Hotho, Andreas
A2 - Knobbe, Arno
A2 - Maathuis, Marloes
A2 - Robardet, Céline
PB - Springer
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
Y2 - 16 September 2019 through 20 September 2019
ER -