TY - GEN
T1 - Design Considerations for a Sustainable Scholarly Big Data Service
AU - Wu, Jian
AU - Rohatgi, Shaurya
AU - Angadi, Manoj K.
AU - Puranik, Kavya S.
AU - Giles, C. Lee
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/12/9
Y1 - 2022/12/9
N2 - The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.
AB - The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.
UR - http://www.scopus.com/inward/record.url?scp=85146646420&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146646420&partnerID=8YFLogxK
U2 - 10.1145/3574318.3574340
DO - 10.1145/3574318.3574340
M3 - Conference contribution
AN - SCOPUS:85146646420
T3 - ACM International Conference Proceeding Series
SP - 83
EP - 87
BT - FIRE 2022 - Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation
A2 - Ganguly, Debasis
A2 - Gangopadhyay, Surupendu
A2 - Mitra, Mandar
A2 - Majumder, Prasenjit
PB - Association for Computing Machinery
T2 - 14th Annual Forum for Information Retrieval Evaluation
Y2 - 9 December 2022 through 13 December 2022
ER -