Improving grouped-entity resolution using Quasi-Cliques

Byung Won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kangt, Jian Pei

Research output: Chapter in Book/Report/Conference proceedingConference contribution

41 Scopus citations

Abstract

The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.

Original languageEnglish (US)
Title of host publicationProceedings - Sixth International Conference on Data Mining, ICDM 2006
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1008-1015
Number of pages8
ISBN (Print)0769527019, 9780769527017
DOIs
StatePublished - 2006
Event6th International Conference on Data Mining, ICDM 2006 - Hong Kong, China
Duration: Dec 18 2006Dec 22 2006

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Other

Other6th International Conference on Data Mining, ICDM 2006
Country/TerritoryChina
CityHong Kong
Period12/18/0612/22/06

All Science Journal Classification (ASJC) codes

  • General Engineering

Fingerprint

Dive into the research topics of 'Improving grouped-entity resolution using Quasi-Cliques'. Together they form a unique fingerprint.

Cite this