TY - GEN
T1 - HESDK
T2 - 17th ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
AU - Wu, Jian
AU - Choudhury, Sagnik Ray
AU - Chiatti, Agnese
AU - Liang, Chen
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/25
Y1 - 2017/7/25
N2 - We investigate a variant of the problem of automatic keyphrase extraction from scientific documents, which we define as Scientific Domain Knowledge Entity (SDKE) extraction. Keyphrases are noun phrases important to the documents themselves. In contrast, an SDKE is text that refers to a concept and can be classified as a process, material, task, dataset etc. A SDKE represents domain knowledge, but is not necessarily important to the document it is in. Supervised keyphrase extraction algorithms using non-sequential classifiers and global measures of informativeness (PMI, tf-idf) have been used for this task. Another approach is to use sequential labeling algorithms with local context from a sentence, as done in the named entity recognition. We show that these two methods can complement each other and a simple merging can improve the extraction accuracy by 5-7 percentiles. We further propose several heuristics to improve the extraction accuracy. Our preliminary experiments suggest that it is possible to improve the accuracy of the sequential learner itself by utilizing the predictions of the non-sequential model.
AB - We investigate a variant of the problem of automatic keyphrase extraction from scientific documents, which we define as Scientific Domain Knowledge Entity (SDKE) extraction. Keyphrases are noun phrases important to the documents themselves. In contrast, an SDKE is text that refers to a concept and can be classified as a process, material, task, dataset etc. A SDKE represents domain knowledge, but is not necessarily important to the document it is in. Supervised keyphrase extraction algorithms using non-sequential classifiers and global measures of informativeness (PMI, tf-idf) have been used for this task. Another approach is to use sequential labeling algorithms with local context from a sentence, as done in the named entity recognition. We show that these two methods can complement each other and a simple merging can improve the extraction accuracy by 5-7 percentiles. We further propose several heuristics to improve the extraction accuracy. Our preliminary experiments suggest that it is possible to improve the accuracy of the sequential learner itself by utilizing the predictions of the non-sequential model.
UR - http://www.scopus.com/inward/record.url?scp=85028007722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028007722&partnerID=8YFLogxK
U2 - 10.1109/JCDL.2017.7991580
DO - 10.1109/JCDL.2017.7991580
M3 - Conference contribution
AN - SCOPUS:85028007722
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
BT - 2017 ACM/IEEE Joint Conference on Digital Libraries, JCDL 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 June 2017 through 23 June 2017
ER -