TY - GEN
T1 - Document analysis and retrieval tasks in scientific digital libraries
AU - Gollapalli, Sujatha Das
AU - Caragea, Cornelia
AU - Li, Xiaoli
AU - Giles, C. Lee
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.
AB - Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.
UR - http://www.scopus.com/inward/record.url?scp=84951786317&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84951786317&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-25485-2_1
DO - 10.1007/978-3-319-25485-2_1
M3 - Conference contribution
AN - SCOPUS:84951786317
SN - 9783319254845
T3 - Communications in Computer and Information Science
SP - 3
EP - 20
BT - Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers
A2 - Braslavski, Pavel
A2 - Volkovich, Yana
A2 - Worring, Marcel
A2 - Karpov, Nikolay
A2 - Ignatov, Dmitry I.
PB - Springer Verlag
T2 - 8th Russian Summer School on Information Retrieval, RuSSIR 2014
Y2 - 18 August 2014 through 22 August 2014
ER -