TY - GEN
T1 - Identifying value mappings for data integration
T2 - 6th International Conference on Web Information Systems Engineering, WISE 2005
AU - Kang, Jaewoo
AU - Lee, Dongwon
AU - Mitra, Prasenjit
N1 - Copyright:
Copyright 2008 Elsevier B.V., All rights reserved.
PY - 2005
Y1 - 2005
N2 - The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. "Two-door front wheel drive" can be represented as "2DR-FWD" or "R2FD", or even as "CAR TYPE 3" in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.
AB - The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. "Two-door front wheel drive" can be represented as "2DR-FWD" or "R2FD", or even as "CAR TYPE 3" in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.
UR - http://www.scopus.com/inward/record.url?scp=33744788355&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33744788355&partnerID=8YFLogxK
U2 - 10.1007/11581062_46
DO - 10.1007/11581062_46
M3 - Conference contribution
AN - SCOPUS:33744788355
SN - 3540300171
SN - 9783540300175
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 544
EP - 551
BT - Web Information Systems Engineering, WISE 2005 - 6th International Conference on Web Information Systems Engineering, Proceedings
Y2 - 20 November 2005 through 22 November 2005
ER -