TY - GEN
T1 - Financial entity record linkage with random forests
AU - Kim, Kunho
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2016 Copyright held by the owner/author(s).
PY - 2016/6/26
Y1 - 2016/6/26
N2 - Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.
AB - Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.
UR - http://www.scopus.com/inward/record.url?scp=84992759006&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84992759006&partnerID=8YFLogxK
U2 - 10.1145/2951894.2951908
DO - 10.1145/2951894.2951908
M3 - Conference contribution
AN - SCOPUS:84992759006
T3 - Proceedings of the ACM SIGMOD International Conference on Management of Data
BT - Proceedings of the 2nd International Workshop on Data Science for Macro-Modeling, DSMM 2016 - In conjunction with the ACM SIGMOD/PODS Conference
PB - Association for Computing Machinery
T2 - 2nd International Workshop on Data Science for Macro-Modeling, DSMM 2016
Y2 - 1 July 2016
ER -