Identifying value mappings for data integration: An unsupervised approach

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Scopus citations

Abstract

The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data among sources abound. Most of the current data cleaning solutions assume that the data values referencing the same object bear some textual similarity. However, this assumption is often violated in practice. "Two-door front wheel drive" can be represented as "2DR-FWD" or "R2FD", or even as "CAR TYPE 3" in different data sources. To address this problem, we propose a novel two-step automated technique that exploits statistical dependency structures among objects which is invariant to the tokens representing the objects. The algorithm achieved a high accuracy in our empirical study, suggesting that it can be a useful addition to the existing information integration techniques.

Original languageEnglish (US)
Title of host publicationWeb Information Systems Engineering, WISE 2005 - 6th International Conference on Web Information Systems Engineering, Proceedings
Pages544-551
Number of pages8
DOIs
StatePublished - 2005
Event6th International Conference on Web Information Systems Engineering, WISE 2005 - New York, NY, United States
Duration: Nov 20 2005Nov 22 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3806 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th International Conference on Web Information Systems Engineering, WISE 2005
Country/TerritoryUnited States
CityNew York, NY
Period11/20/0511/22/05

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Identifying value mappings for data integration: An unsupervised approach'. Together they form a unique fingerprint.

Cite this