TY - GEN
T1 - Filling the gaps
T2 - ACM Symposium on Document Engineering, DocEng 2015
AU - Banerjee, Siddhartha
AU - Mitra, Prasenjit
N1 - Publisher Copyright:
© 2015 ACM.
PY - 2015/9/8
Y1 - 2015/9/8
N2 - The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers-Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (6% F-score). Our generation ap-proach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.
AB - The availability of only a limited number of contributors on Wikipedia cannot ensure consistent growth and improvement of the online encyclopedia. With information being scattered on the web, our goal is to automate the process of generation of content for Wikipedia. In this work, we propose a technique of improving stubs on Wikipedia that do not contain comprehensive information. A classifier learns features from the existing comprehensive articles on Wikipedia and recommends content that can be added to the stubs to improve the completeness of such stubs. We conduct experiments using several classifiers-Latent Dirichlet Allocation (LDA) based model, a deep learning based architecture (Deep belief network) and TFIDF based classifier. Our experiments reveal that the LDA based model outperforms the other models (6% F-score). Our generation ap-proach shows that this technique is capable of generating comprehensive articles. ROUGE-2 scores of the articles generated by our system outperform the articles generated using the baseline. Content generated by our system has been appended to several stubs and successfully retained in Wikipedia.
UR - http://www.scopus.com/inward/record.url?scp=84959229664&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959229664&partnerID=8YFLogxK
U2 - 10.1145/2682571.2797073
DO - 10.1145/2682571.2797073
M3 - Conference contribution
AN - SCOPUS:84959229664
T3 - DocEng 2015 - Proceedings of the 2015 ACM Symposium on Document Engineering
SP - 117
EP - 120
BT - DocEng 2015 - Proceedings of the 2015 ACM Symposium on Document Engineering
PB - Association for Computing Machinery, Inc
Y2 - 8 September 2015 through 11 September 2015
ER -