Design Considerations for a Sustainable Scholarly Big Data Service

Jian Wu, Shaurya Rohatgi, Manoj K. Angadi, Kavya S. Puranik, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.

Original languageEnglish (US)
Title of host publicationFIRE 2022 - Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation
EditorsDebasis Ganguly, Surupendu Gangopadhyay, Mandar Mitra, Prasenjit Majumder
PublisherAssociation for Computing Machinery
Pages83-87
Number of pages5
ISBN (Electronic)9798400700231
DOIs
StatePublished - Dec 9 2022
Event14th Annual Forum for Information Retrieval Evaluation - Kolkata, India
Duration: Dec 9 2022Dec 13 2022

Publication series

NameACM International Conference Proceeding Series

Conference

Conference14th Annual Forum for Information Retrieval Evaluation
Country/TerritoryIndia
CityKolkata
Period12/9/2212/13/22

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Design Considerations for a Sustainable Scholarly Big Data Service'. Together they form a unique fingerprint.

Cite this