TY - GEN
T1 - Web based linkage
AU - Elmacioglu, Ergin
AU - Kan, Min Yen
AU - Lee, Dongwon
AU - Zhang, Yi
PY - 2007
Y1 - 2007
N2 - When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward this problem, we propose a novel approach that uses the Web as the collective knowledge source in addition to contents of entities. Our hypothesis is that if an entity e1 is a duplicate of another entity e2, and if e1 frequently appears together with information I on the Web, then e2 may appear frequently with I on the Web. By using search engines, we analyze the frequency, URLs, or contents of the returned web pages to capture the information I of an entity. Extensive experiments verify that our hypothesis holds in many real settings, and the idea of using the Web as the additional source for the linkage problem is promising. Our proposal shows 51% (on average) and 193% (at best) improvement in precision/recall compared to a baseline approach.
AB - When a variety of names are used for the same real-world entity, the problem of detecting all such variants has been known as the (record) linkage or entity resolution problem. In this paper, toward this problem, we propose a novel approach that uses the Web as the collective knowledge source in addition to contents of entities. Our hypothesis is that if an entity e1 is a duplicate of another entity e2, and if e1 frequently appears together with information I on the Web, then e2 may appear frequently with I on the Web. By using search engines, we analyze the frequency, URLs, or contents of the returned web pages to capture the information I of an entity. Extensive experiments verify that our hypothesis holds in many real settings, and the idea of using the Web as the additional source for the linkage problem is promising. Our proposal shows 51% (on average) and 193% (at best) improvement in precision/recall compared to a baseline approach.
UR - http://www.scopus.com/inward/record.url?scp=77951199843&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951199843&partnerID=8YFLogxK
U2 - 10.1145/1316902.1316922
DO - 10.1145/1316902.1316922
M3 - Conference contribution
AN - SCOPUS:77951199843
SN - 9781595938299
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 121
EP - 128
BT - Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07
T2 - 9th Annual ACM International Workshop on Web Information and Data Management, WIDM '07, Co-located with the 16th ACM Conference on Information and Knowledge Management, CIKM '07
Y2 - 6 November 2007 through 9 November 2007
ER -