Abstract
Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper presents a hierarchical naive Bayes mixture model, an unsupervised learning approach, for name disambiguation in author citations. This method partitions a collection of citations 1 into clusters, with each cluster containing only citations authored by the same author, thus disambiguating authorship in citations to induce author name identities. Three types of citation features are used: co-author names, paper title words, and journal or proceeding title words. The approach is illustrated with 16 name datasets that are constructed based on the publication lists collected from author homepages and DBLP computer science bibliography.
Original language | English (US) |
---|---|
Title of host publication | Applied Computing 2005 - Proceedings of the 20th Annual ACM Symposium on Applied Computing |
Pages | 1065-1069 |
Number of pages | 5 |
Volume | 2 |
DOIs | |
State | Published - 2005 |
Event | 20th Annual ACM Symposium on Applied Computing - Santa Fe, NM, United States Duration: Mar 13 2005 → Mar 17 2005 |
Other
Other | 20th Annual ACM Symposium on Applied Computing |
---|---|
Country/Territory | United States |
City | Santa Fe, NM |
Period | 3/13/05 → 3/17/05 |
All Science Journal Classification (ASJC) codes
- General Computer Science