A hierarchical naive bayes mixture model for name disambiguation in author citations

Hui Han, Hongyuan Zha, Wei Xu, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

71 Scopus citations

Abstract

Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper presents a hierarchical naive Bayes mixture model, an unsupervised learning approach, for name disambiguation in author citations. This method partitions a collection of citations 1 into clusters, with each cluster containing only citations authored by the same author, thus disambiguating authorship in citations to induce author name identities. Three types of citation features are used: co-author names, paper title words, and journal or proceeding title words. The approach is illustrated with 16 name datasets that are constructed based on the publication lists collected from author homepages and DBLP computer science bibliography.

Original languageEnglish (US)
Title of host publicationApplied Computing 2005 - Proceedings of the 20th Annual ACM Symposium on Applied Computing
Pages1065-1069
Number of pages5
Volume2
DOIs
StatePublished - 2005
Event20th Annual ACM Symposium on Applied Computing - Santa Fe, NM, United States
Duration: Mar 13 2005Mar 17 2005

Other

Other20th Annual ACM Symposium on Applied Computing
Country/TerritoryUnited States
CitySanta Fe, NM
Period3/13/053/17/05

All Science Journal Classification (ASJC) codes

  • General Computer Science

Fingerprint

Dive into the research topics of 'A hierarchical naive bayes mixture model for name disambiguation in author citations'. Together they form a unique fingerprint.

Cite this