Extracting author meta-data from web using visual features

Shuyi Zheng, Ding Zhou, Jia Li, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

Enriching digital library's author meta-data can lead to valuable services and applications. This paper addresses the problem of extracting authors' information from their homepages. This problem is actually a multiclass classification problem. A homepage can be treated as a group of information pieces which need to be classified to different fields, e.g., Name, Title, Affiliation, Email, etc. In this problem, not only each information piece can be viewed as a point in a feature space, but also certain patterns can be observed among different fields on a page. To improve the extraction accuracy, this paper argues that visual features of information pieces on a homepage should be sufficiently utilized. In addition, this paper also proposes an inter-fields probability model to capture the relation among different fields. This model can be combined with feature-space based classification. Experimental results demonstrate that utilizing visual features and applying the inter-fields probability model can significantly improve the extraction accuracy.

Original languageEnglish (US)
Title of host publicationICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
Pages33-38
Number of pages6
DOIs
StatePublished - 2007
Event17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007 - Omaha, NE, United States
Duration: Oct 28 2007Oct 31 2007

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
Country/TerritoryUnited States
CityOmaha, NE
Period10/28/0710/31/07

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Extracting author meta-data from web using visual features'. Together they form a unique fingerprint.

Cite this