TY - GEN
T1 - Extracting Semantic Relations for Scholarly Knowledge Base Construction
AU - Al-Zaidy, Rabah A.
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/4/9
Y1 - 2018/4/9
N2 - The problem of information extraction from scientific articles, found as PDF documents in large digital repositories, is gaining more attention as the amount of research findings continues to grow. We propose a system to extract semantic relations among entities in scholarly articles by making use of external syntactic patterns and an iterative learner. While information extraction from scholarly documents have been studied before, it has been focused mainly on the abstract and keywords. Our method extracts semantic entities as concepts and instances along with their attributes from the fully body text of documents. We extract two types of relationships between concepts in the text using an iterative learning algorithm. External data sources from the web such as the Microsoft concept graph, as well as query logs, are utilized to evaluate the quality of the extracted concepts and relations. The concepts are used to construct a scientific taxonomy covering the research content of the documents. To evaluate the system we apply our approach on a set of 10k scholarly documents and conduct several evaluations to show the effectiveness of the proposed methods. We show that our system obtains a 23% improvement in precision over existing web IE tools when they are applied to scholarly documents.
AB - The problem of information extraction from scientific articles, found as PDF documents in large digital repositories, is gaining more attention as the amount of research findings continues to grow. We propose a system to extract semantic relations among entities in scholarly articles by making use of external syntactic patterns and an iterative learner. While information extraction from scholarly documents have been studied before, it has been focused mainly on the abstract and keywords. Our method extracts semantic entities as concepts and instances along with their attributes from the fully body text of documents. We extract two types of relationships between concepts in the text using an iterative learning algorithm. External data sources from the web such as the Microsoft concept graph, as well as query logs, are utilized to evaluate the quality of the extracted concepts and relations. The concepts are used to construct a scientific taxonomy covering the research content of the documents. To evaluate the system we apply our approach on a set of 10k scholarly documents and conduct several evaluations to show the effectiveness of the proposed methods. We show that our system obtains a 23% improvement in precision over existing web IE tools when they are applied to scholarly documents.
UR - http://www.scopus.com/inward/record.url?scp=85048394928&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048394928&partnerID=8YFLogxK
U2 - 10.1109/ICSC.2018.00017
DO - 10.1109/ICSC.2018.00017
M3 - Conference contribution
AN - SCOPUS:85048394928
T3 - Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018
SP - 56
EP - 63
BT - Proceedings - 12th IEEE International Conference on Semantic Computing, ICSC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th IEEE International Conference on Semantic Computing, ICSC 2018
Y2 - 31 January 2018 through 2 February 2018
ER -