TY - GEN
T1 - Entity resolution using search engine results
AU - Khabsa, Madian
AU - Treeratpituk, Pucktada
AU - Giles, C. Lee
PY - 2012
Y1 - 2012
N2 - Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.
AB - Given a set of automatically extracted entities E of size n, we would like to cluster all the various names referring to the same canonical entity together. The variations of each entity include acronyms, full name, and informal naming conventions. We propose using search engine results to cluster variations of each entity based on the URLs appearing in those results. We create a cluster C for each top search result returned by querying for the entity e ∈ E assigning e to the cluster C. Our experiments on a manually created dataset shows that our approach achieves higher precision and recall than string matching algorithm and hierarchical clustering based disambiguation methods.
UR - http://www.scopus.com/inward/record.url?scp=84871087336&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84871087336&partnerID=8YFLogxK
U2 - 10.1145/2396761.2398641
DO - 10.1145/2396761.2398641
M3 - Conference contribution
AN - SCOPUS:84871087336
SN - 9781450311564
T3 - ACM International Conference Proceeding Series
SP - 2363
EP - 2366
BT - CIKM 2012 - Proceedings of the 21st ACM International Conference on Information and Knowledge Management
T2 - 21st ACM International Conference on Information and Knowledge Management, CIKM 2012
Y2 - 29 October 2012 through 2 November 2012
ER -