TY - JOUR
T1 - Uninterpreted schema matching with embedded value mapping under opaque column names and data values
AU - Jaiswal, Anuj
AU - Miller, David J.
AU - Mitra, Prasenjit
N1 - Funding Information:
This work was performed with support from the National Visualization and Analytics Center (NVAC), a US Department of Homeland Security Program, under the auspices of the Northeast Regional Visualization and Analytics Center (NEVAC). NVAC is operated by the Pacific Northwest National Laboratory (PNNL), a US Department of Energy Office of Science Laboratory.
PY - 2010/2
Y1 - 2010/2
N2 - Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.
AB - Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.
UR - http://www.scopus.com/inward/record.url?scp=75449088101&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=75449088101&partnerID=8YFLogxK
U2 - 10.1109/TKDE.2009.69
DO - 10.1109/TKDE.2009.69
M3 - Article
AN - SCOPUS:75449088101
SN - 1041-4347
VL - 22
SP - 291
EP - 304
JO - IEEE Transactions on Knowledge and Data Engineering
JF - IEEE Transactions on Knowledge and Data Engineering
IS - 2
M1 - 4799783
ER -