TY - GEN
T1 - Efficient Knowledge Graph Validation via Cross-Graph Representation Learning
AU - Wang, Yaqing
AU - Ma, Fenglong
AU - Gao, Jing
N1 - Funding Information:
The authors would like to thank the anonymous referees for their valuable comments and helpful suggestions. This work is supported in part by the US National Science Foundation under grants NSF-IIS 1747614 and NSF-IIS 1553411. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Publisher Copyright:
© 2020 ACM.
PY - 2020/10/19
Y1 - 2020/10/19
N2 - Recent advances in information extraction have motivated the automatic construction of huge Knowledge Graphs (KGs) by mining from large-scale text corpus. However, noisy facts are unavoidably introduced into KGs that could be caused by automatic extraction. To validate the correctness of facts (i.e., triplets) inside a KG, one possible approach is to map the triplets into vector representations by capturing the semantic meanings of facts. Although many representation learning approaches have been developed for knowledge graphs, these methods are not effective for validation. They usually assume that facts are correct, and thus may overfit noisy facts and fail to detect such facts. Towards effective KG validation, we propose to leverage an external human-curated KG as auxiliary information source to help detect the errors in a target KG. The external KG is built upon human-curated knowledge repositories and tends to have high precision. On the other hand, although the target KG built by information extraction from texts has low precision, it can cover new or domain-specific facts that are not in any human-curated repositories. To leverage external KG for validation, one intuitive approach is to find matched triplets between the target KG and the external KG. However, this approach can only apply to the small portion of triplets that are covered by both external and target KGs and is not useful for the validation of majority of the triplets. To tackle this challenging task, we propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently. This is achieved by embedding triplets based on their semantic meanings, drawing cross-KG negative samples and estimating a confidence score for each triplet based on its degree of correctness. We evaluate the proposed framework on datasets across different domains. Experimental results show that the proposed framework achieves the best performance compared with the state-of-the-art methods on large-scale KGs.
AB - Recent advances in information extraction have motivated the automatic construction of huge Knowledge Graphs (KGs) by mining from large-scale text corpus. However, noisy facts are unavoidably introduced into KGs that could be caused by automatic extraction. To validate the correctness of facts (i.e., triplets) inside a KG, one possible approach is to map the triplets into vector representations by capturing the semantic meanings of facts. Although many representation learning approaches have been developed for knowledge graphs, these methods are not effective for validation. They usually assume that facts are correct, and thus may overfit noisy facts and fail to detect such facts. Towards effective KG validation, we propose to leverage an external human-curated KG as auxiliary information source to help detect the errors in a target KG. The external KG is built upon human-curated knowledge repositories and tends to have high precision. On the other hand, although the target KG built by information extraction from texts has low precision, it can cover new or domain-specific facts that are not in any human-curated repositories. To leverage external KG for validation, one intuitive approach is to find matched triplets between the target KG and the external KG. However, this approach can only apply to the small portion of triplets that are covered by both external and target KGs and is not useful for the validation of majority of the triplets. To tackle this challenging task, we propose a cross-graph representation learning framework, i.e., CrossVal, which can leverage an external KG to validate the facts in the target KG efficiently. This is achieved by embedding triplets based on their semantic meanings, drawing cross-KG negative samples and estimating a confidence score for each triplet based on its degree of correctness. We evaluate the proposed framework on datasets across different domains. Experimental results show that the proposed framework achieves the best performance compared with the state-of-the-art methods on large-scale KGs.
UR - http://www.scopus.com/inward/record.url?scp=85095863616&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85095863616&partnerID=8YFLogxK
U2 - 10.1145/3340531.3411902
DO - 10.1145/3340531.3411902
M3 - Conference contribution
AN - SCOPUS:85095863616
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1595
EP - 1604
BT - CIKM 2020 - Proceedings of the 29th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020
Y2 - 19 October 2020 through 23 October 2020
ER -