TY - GEN
T1 - Improving grouped-entity resolution using Quasi-Cliques
AU - On, Byung Won
AU - Elmacioglu, Ergin
AU - Lee, Dongwon
AU - Kangt, Jaewoo
AU - Pei, Jian
PY - 2006
Y1 - 2006
N2 - The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.
AB - The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on resolving entities that contain a group of related elements in them (e.g., an author entity with a list of citations, a singer entity with song list, or an intermediate result by GROUP BY SQL query). Such entities, named as grouped-entities, frequently occur in many applications. The previous approaches toward grouped-entity resolution often rely on textual similarity, and produce a large number of false positives. As a complementing technique, in this paper, we present our experience of applying a recently proposed graph mining technique, Quasi-Clique, atop conventional ER solutions. Our approach exploits contextual information mined from the group of elements per entity in addition to syntactic similarity. Extensive experiments verify that our proposal improves precision and recall up to 83% when used together with a variety of existing ER solutions, but never worsens them.
UR - http://www.scopus.com/inward/record.url?scp=47249101877&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=47249101877&partnerID=8YFLogxK
U2 - 10.1109/ICDM.2006.85
DO - 10.1109/ICDM.2006.85
M3 - Conference contribution
AN - SCOPUS:47249101877
SN - 0769527019
SN - 9780769527017
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 1008
EP - 1015
BT - Proceedings - Sixth International Conference on Data Mining, ICDM 2006
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Data Mining, ICDM 2006
Y2 - 18 December 2006 through 22 December 2006
ER -