Entity resolution using search engine results

Madian Khabsa, Pucktada Treeratpituk, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Scopus citations

Abstract

Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.

Original languageEnglish (US)
Title of host publicationCIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
Pages2363-2366
Number of pages4
DOIs
StatePublished - 2012
Event21st ACM International Conference on Information and Knowledge Management, CIKM 2012 - Maui, HI, United States
Duration: Oct 29 2012Nov 2 2012

Publication series

NameACM International Conference Proceeding Series

Other

Other21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Country/TerritoryUnited States
CityMaui, HI
Period10/29/1211/2/12

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Entity resolution using search engine results'. Together they form a unique fingerprint.

Cite this