TY - GEN
T1 - Information extraction for scholarly digital libraries
AU - Williams, Kyle
AU - Wu, Jian
AU - Wu, Zhaohui
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/9/1
Y1 - 2016/9/1
N2 - Scholarly documents contain many data entities, such as titles, authors, affiliations, figures, and tables. These entities can be used to enhance digital library services through enhanced metadata and enable the development of new services and tools for interacting with and exploring scholarly data. However, in a world of scholarly big data, extracting these entities in a scalable, efficient and accurate manner can be challenging. In this tutorial, we introduce the broad field of information extraction for scholarly digital libraries. Drawing on our experience in running the Cite-SeerX digital library, which has performed information extraction on over 7 million academic documents, we argue for the need for automatic information extraction, describe different approaches for performing information extraction, present tools and datasets that are readily available, and describe best practices and areas of research interest.
AB - Scholarly documents contain many data entities, such as titles, authors, affiliations, figures, and tables. These entities can be used to enhance digital library services through enhanced metadata and enable the development of new services and tools for interacting with and exploring scholarly data. However, in a world of scholarly big data, extracting these entities in a scalable, efficient and accurate manner can be challenging. In this tutorial, we introduce the broad field of information extraction for scholarly digital libraries. Drawing on our experience in running the Cite-SeerX digital library, which has performed information extraction on over 7 million academic documents, we argue for the need for automatic information extraction, describe different approaches for performing information extraction, present tools and datasets that are readily available, and describe best practices and areas of research interest.
UR - http://www.scopus.com/inward/record.url?scp=84989871203&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84989871203&partnerID=8YFLogxK
U2 - 10.1145/2910896.2925430
DO - 10.1145/2910896.2925430
M3 - Conference contribution
AN - SCOPUS:84989871203
T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries
SP - 287
EP - 288
BT - JCDL 2016 - Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 16th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2016
Y2 - 19 June 2016 through 23 June 2016
ER -