TY - GEN
T1 - Two supervised learning approaches for name disambiguation in author citations
AU - Han, Hui
AU - Giles, Lee
AU - Zha, Hongyuan
AU - Li, Cheng
AU - Tsioutsiouliklis, Kostas
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2004
Y1 - 2004
N2 - Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper investigates two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses Support Vector Machines(SVMs) and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: co-author names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the web, mainly publication lists from homepages, the other collected from the DBLP citation databases.
AB - Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper investigates two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses Support Vector Machines(SVMs) and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: co-author names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the web, mainly publication lists from homepages, the other collected from the DBLP citation databases.
UR - http://www.scopus.com/inward/record.url?scp=4944235920&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4944235920&partnerID=8YFLogxK
U2 - 10.1145/996350.996419
DO - 10.1145/996350.996419
M3 - Conference contribution
AN - SCOPUS:4944235920
SN - 1581138326
SN - 9781581138320
T3 - Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004
SP - 296
EP - 305
BT - Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004
PB - Association for Computing Machinery
T2 - Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004
Y2 - 7 June 2004 through 11 June 2004
ER -