TY - GEN
T1 - Keyphrase Extraction in Scholarly Digital Library Search Engines
AU - Patel, Krutarth
AU - Caragea, Cornelia
AU - Wu, Jian
AU - Giles, C. Lee
N1 - Funding Information:
We thank the National Science Foundation (NSF) for support from grants CNS-1853919, IIS-1914575, and IIS-1813571, which supported this research. Any opinions, findings, and conclusions expressed here are those of the authors and do not necessarily reflect the views of NSF. We also thank our anonymous reviewers for their constructive feedback.
Publisher Copyright:
© 2020, Springer Nature Switzerland AG.
PY - 2020
Y1 - 2020
N2 - Scholarly digital libraries provide access to scientific publications and comprise useful resources for researchers who search for literature on specific subject areas. CiteSeerX is an example of such a digital library search engine that provides access to more than 10 million academic documents and has nearly one million users and three million hits per day. Artificial Intelligence (AI) technologies are used in many components of CiteSeerX including Web crawling, document ingestion, and metadata extraction. CiteSeerX also uses an unsupervised algorithm called noun phrase chunking (NP-Chunking) to extract keyphrases out of documents. However, often NP-Chunking extracts many unimportant noun phrases. In this paper, we investigate and contrast three supervised keyphrase extraction models to explore their deployment in CiteSeerX for extracting high quality keyphrases. To perform user evaluations on the keyphrases predicted by different models, we integrate a voting interface into CiteSeerX. We show the development and deployment of the keyphrase extraction models and the maintenance requirements.
AB - Scholarly digital libraries provide access to scientific publications and comprise useful resources for researchers who search for literature on specific subject areas. CiteSeerX is an example of such a digital library search engine that provides access to more than 10 million academic documents and has nearly one million users and three million hits per day. Artificial Intelligence (AI) technologies are used in many components of CiteSeerX including Web crawling, document ingestion, and metadata extraction. CiteSeerX also uses an unsupervised algorithm called noun phrase chunking (NP-Chunking) to extract keyphrases out of documents. However, often NP-Chunking extracts many unimportant noun phrases. In this paper, we investigate and contrast three supervised keyphrase extraction models to explore their deployment in CiteSeerX for extracting high quality keyphrases. To perform user evaluations on the keyphrases predicted by different models, we integrate a voting interface into CiteSeerX. We show the development and deployment of the keyphrase extraction models and the maintenance requirements.
UR - http://www.scopus.com/inward/record.url?scp=85092126407&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85092126407&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-59618-7_12
DO - 10.1007/978-3-030-59618-7_12
M3 - Conference contribution
AN - SCOPUS:85092126407
SN - 9783030596170
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 179
EP - 196
BT - Web Services – ICWS 2020 - 27th International Conference, Held as Part of the Services Conference Federation, SCF 2020, Proceedings
A2 - Ku, Wei-Shinn
A2 - Kanemasa, Yasuhiko
A2 - Serhani, Mohamed Adel
A2 - Zhang, Liang-Jie
PB - Springer Science and Business Media Deutschland GmbH
T2 - 27th International Conference on Web Services, ICWS 2020, held as part of the Services Conference Federation, SCF 2020
Y2 - 18 September 2020 through 20 September 2020
ER -