Knowledge-Guided Paraphrase Identification

Haoyu Wang, Fenglong Ma, Yaqing Wang, Jing Gao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Scopus citations

Abstract

Paraphrase identification (PI), a fundamental task in natural language processing, is to identify whether two sentences express the same or similar meaning, which is a binary classification problem. Recently, BERT-like pretrained language models have been a popular choice for the frameworks of various PI models, but almost all existing methods consider general domain text. When these approaches are applied to a specific domain, existing models cannot make accurate predictions due to the lack of professional knowledge. In light of this challenge, we propose a novel framework, namely Knowing, which can leverage the external unstructured Wikipedia knowledge to accurately identify paraphrases. We propose to mine outline knowledge of concepts related to given sentences from Wikipedia via BM25 model. After retrieving related outline knowledge, Knowing makes predictions based on both the semantic information of two sentences and the outline knowledge. Besides, we propose a gating mechanism to aggregate the semantic information-based prediction and the knowledge-based prediction. Extensive experiments are conducted on two public datasets: PARADE (a computer science domain dataset) and clinicalSTS2019 (a biomedical domain dataset). The results show that the proposed Knowing outperforms state-ofthe-art methods.

Original languageEnglish (US)
Title of host publicationFindings of the Association for Computational Linguistics, Findings of ACL
Subtitle of host publicationEMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-Tau Yih
PublisherAssociation for Computational Linguistics (ACL)
Pages843-853
Number of pages11
ISBN (Electronic)9781955917100
StatePublished - 2021
Event2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021 - Punta Cana, Dominican Republic
Duration: Nov 7 2021Nov 11 2021

Publication series

NameFindings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Conference

Conference2021 Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period11/7/2111/11/21

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Knowledge-Guided Paraphrase Identification'. Together they form a unique fingerprint.

Cite this