Two supervised learning approaches for name disambiguation in author citations

Hui Han, Lee Giles, Hongyuan Zha, Cheng Li, Kostas Tsioutsiouliklis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

303 Scopus citations

Abstract

Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper investigates two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses Support Vector Machines(SVMs) and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: co-author names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the web, mainly publication lists from homepages, the other collected from the DBLP citation databases.

Original languageEnglish (US)
Title of host publicationProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004
PublisherAssociation for Computing Machinery
Pages296-305
Number of pages10
ISBN (Print)1581138326, 9781581138320
DOIs
StatePublished - 2004
EventProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004 - Tucson, AZ, United States
Duration: Jun 7 2004Jun 11 2004

Publication series

NameProceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004

Other

OtherProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004
Country/TerritoryUnited States
CityTucson, AZ
Period6/7/046/11/04

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Two supervised learning approaches for name disambiguation in author citations'. Together they form a unique fingerprint.

Cite this