Playscript classification and automatic wikipedia play articles generation

Siddhartha Banerjee, Cornelia Caragea, Prasenjit Mitra

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

In this work, we aim to create Wikipedia pages on plays automatically by extracting relevant information from various web sources. Our approach involves building an efficient classifier that can classify web documents as play scripts. From the set of correctly classified instances of play scripts, we extract relevant play-related information from the documents and use it to obtain additional information from various sources on the web. This information is aggregated and human-readable Wikipedia pages are created using a bot. The results of our experiments show that classifiers trained by combining our designed features along with 'bag-of-words' (bow) features outperform classifiers trained using only bow features. Our approach further shows that good quality human-readable pages can be created using our bot. Such automatic page generation process can eventually ensure a more complete Wikipedia.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Pattern Recognition
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3630-3635
Number of pages6
ISBN (Electronic)9781479952083
DOIs
StatePublished - Dec 4 2014
Event22nd International Conference on Pattern Recognition, ICPR 2014 - Stockholm, Sweden
Duration: Aug 24 2014Aug 28 2014

Publication series

NameProceedings - International Conference on Pattern Recognition
ISSN (Print)1051-4651

Other

Other22nd International Conference on Pattern Recognition, ICPR 2014
Country/TerritorySweden
CityStockholm
Period8/24/148/28/14

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition

Fingerprint

Dive into the research topics of 'Playscript classification and automatic wikipedia play articles generation'. Together they form a unique fingerprint.

Cite this