Document analysis and retrieval tasks in scientific digital libraries

Sujatha Das Gollapalli, Cornelia Caragea, Xiaoli Li, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.

Original languageEnglish (US)
Title of host publicationInformation Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers
EditorsPavel Braslavski, Yana Volkovich, Marcel Worring, Nikolay Karpov, Dmitry I. Ignatov
PublisherSpringer Verlag
Number of pages18
ISBN (Print)9783319254845
StatePublished - 2015
Event8th Russian Summer School on Information Retrieval, RuSSIR 2014 - Nizhniy, Novgorod, Russian Federation
Duration: Aug 18 2014Aug 22 2014

Publication series

NameCommunications in Computer and Information Science
ISSN (Print)1865-0929


Other8th Russian Summer School on Information Retrieval, RuSSIR 2014
Country/TerritoryRussian Federation
CityNizhniy, Novgorod

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Mathematics(all)


Dive into the research topics of 'Document analysis and retrieval tasks in scientific digital libraries'. Together they form a unique fingerprint.

Cite this