Automatic class labeling for CiteSeerX

Surya Dhairya Kashireddy, Susan Gauch, Syed Masum Billah

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

The CiteSeerx project at the University of Arkansas uses a browsing interface is based on the Association for Computing Machinery's Computing Classification System (ACM CCS). CCS contains just 369 categories whereas the CiteSeerx database contains over 2 million documents. This results in more than 6500 documents per category, far too many to browse. To address this problem, we are exploring ways to automatically expand the CCS ontology. Previous work has focused on using clustering to automatically identify the new clas-ses. This work focuses on how to label the subclasses in a se-mantically meaningful way to that they can sup-port user browsing. We develop methods based on text mining from the subclass members to extract class la-bels. We evaluate three methods by comparing the suggested labels with human-assigned labels for existing categories.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
Pages241-245
Number of pages5
DOIs
StatePublished - 2013
Event2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013 - Atlanta, GA, United States
Duration: Nov 17 2013Nov 20 2013

Publication series

NameProceedings - 2013 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
Volume1

Other

Other2013 12th IEEE/WIC/ACM International Conference on Web Intelligence, WI 2013
Country/TerritoryUnited States
CityAtlanta, GA
Period11/17/1311/20/13

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automatic class labeling for CiteSeerX'. Together they form a unique fingerprint.

Cite this