Scalable name disambiguation using multi-level graph partition

Byung Won On, Dongwon Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

33 Scopus citations


When non-unique vahies are used as the identifier of entities, due to their homonym, confusion can occur. In particular, when (part of) "names" of entities are used as their identifier, the problem is often referred to as the name disambiguation problem, where goal is to sort out the erroneous entities due to name homonyms (e.g., if only last name is used as the identifier, one cannot distinguish "Vannevar Bush" from "George Rush"). In this paper, in particular, we study the scalability issue of the name disambiguation problem when (1) a small number of entities with large contents or (2) a large number of entities get un-distinguishable due to homonyms, how to resolve it? We first carefully examine two of the state-of-the-art solutions to the name disambiguation problem, and point out their limitations with respect to scalability. Then, we adapt the multi-level graph partition technique to solve the large-scale name disambiguation problem. Our claim is empirically validated via experimentation - our proposal shows orders of magnitude improvement in terms of performance while maintaining equivalent or reasonable accuracy compared to competing solutions.

Original languageEnglish (US)
Title of host publicationProceedings of the 7th SIAM International Conference on Data Mining
PublisherSociety for Industrial and Applied Mathematics Publications
Number of pages6
ISBN (Print)9780898716306
StatePublished - 2007
Event7th SIAM International Conference on Data Mining - Minneapolis, MN, United States
Duration: Apr 26 2007Apr 28 2007

Publication series

NameProceedings of the 7th SIAM International Conference on Data Mining


Other7th SIAM International Conference on Data Mining
Country/TerritoryUnited States
CityMinneapolis, MN

All Science Journal Classification (ASJC) codes

  • General Engineering


Dive into the research topics of 'Scalable name disambiguation using multi-level graph partition'. Together they form a unique fingerprint.

Cite this