TY - GEN
T1 - Adaptive sorted neighborhood methods for efficient record linkage
AU - Yan, Su
AU - Lee, Dongwon
AU - Kan, Min Yen
AU - Giles, Lee C.
PY - 2007
Y1 - 2007
N2 - Traditionally, record linkage algorithms have played an important role in maintaining digital libraries - i.e., identifying matching citations or authors for consolidation in updating or integrating digital libraries. As such, a variety of record linkage algorithms have been developed and deployed successfully. Often, however, existing solutions have a set of parameters whose values are set by human experts off-lineand are fixed during the execution. Since finding the ideal values of such parameters is not straightforward, or no such single ideal value even exists, the applicability of existing solutions to new scenarios or domains is greatly hampered. To remedy this problem, we argue that one can achieve significant improvement by adaptively and dynamically changing such parameters of record linkage algorithms. To validate our hypothesis, we take a classical record linkage algorithm, the sorted neighborhood method (SNM), and demonstrate how we can achieve improved accuracy and performance by adaptively changing its fixed sliding window size. Our claim is analytically and empirically validated using both real and synthetic data sets of digital libraries and other domains.
AB - Traditionally, record linkage algorithms have played an important role in maintaining digital libraries - i.e., identifying matching citations or authors for consolidation in updating or integrating digital libraries. As such, a variety of record linkage algorithms have been developed and deployed successfully. Often, however, existing solutions have a set of parameters whose values are set by human experts off-lineand are fixed during the execution. Since finding the ideal values of such parameters is not straightforward, or no such single ideal value even exists, the applicability of existing solutions to new scenarios or domains is greatly hampered. To remedy this problem, we argue that one can achieve significant improvement by adaptively and dynamically changing such parameters of record linkage algorithms. To validate our hypothesis, we take a classical record linkage algorithm, the sorted neighborhood method (SNM), and demonstrate how we can achieve improved accuracy and performance by adaptively changing its fixed sliding window size. Our claim is analytically and empirically validated using both real and synthetic data sets of digital libraries and other domains.
UR - http://www.scopus.com/inward/record.url?scp=36348961379&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=36348961379&partnerID=8YFLogxK
U2 - 10.1145/1255175.1255213
DO - 10.1145/1255175.1255213
M3 - Conference contribution
AN - SCOPUS:36348961379
SN - 1595936440
SN - 9781595936448
T3 - Proceedings of the ACM International Conference on Digital Libraries
SP - 185
EP - 194
BT - Proceedings of the 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007
T2 - 7th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2007: Building and Sustaining the Digital Environment
Y2 - 18 June 2007 through 23 June 2007
ER -