TY - GEN
T1 - Scalable name disambiguation using multi-level graph partition
AU - On, Byung Won
AU - Lee, Dongwon
N1 - Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2007
Y1 - 2007
N2 - When non-unique vahies are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) "names" of entities are used as their identifier, the problem is often referred to as the name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as the identifier, one cannot distinguish "Vannevar Bush" from "George Rush"). In this paper, in particular, we study the scalability issue of the name disambiguation problem when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms, how to resolve it? We first carefully examine two of the state-of-the-art solutions to the name disambiguation problem, and point out their limitations with respect to scalability. Then, we adapt the multi-level graph partition technique to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation - our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.
AB - When non-unique vahies are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) "names" of entities are used as their identifier, the problem is often referred to as the name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as the identifier, one cannot distinguish "Vannevar Bush" from "George Rush"). In this paper, in particular, we study the scalability issue of the name disambiguation problem when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms, how to resolve it? We first carefully examine two of the state-of-the-art solutions to the name disambiguation problem, and point out their limitations with respect to scalability. Then, we adapt the multi-level graph partition technique to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation - our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.
UR - http://www.scopus.com/inward/record.url?scp=70449123512&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70449123512&partnerID=8YFLogxK
U2 - 10.1137/1.9781611972771.64
DO - 10.1137/1.9781611972771.64
M3 - Conference contribution
AN - SCOPUS:70449123512
SN - 9780898716306
T3 - Proceedings of the 7th SIAM International Conference on Data Mining
SP - 575
EP - 580
BT - Proceedings of the 7th SIAM International Conference on Data Mining
PB - Society for Industrial and Applied Mathematics Publications
T2 - 7th SIAM International Conference on Data Mining
Y2 - 26 April 2007 through 28 April 2007
ER -