Design Considerations for a Sustainable Scholarly Big Data Service

Jian Wu, Shaurya Rohatgi, Manoj K. Angadi, Kavya S. Puranik, C. Lee Giles
{"title":"Design Considerations for a Sustainable Scholarly Big Data Service","authors":"Jian Wu, Shaurya Rohatgi, Manoj K. Angadi, Kavya S. Puranik, C. Lee Giles","doi":"10.1145/3574318.3574340","DOIUrl":null,"url":null,"abstract":"The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.","PeriodicalId":270700,"journal":{"name":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3574318.3574340","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The advancement of web programming techniques, such as Ajax and jQuery, and datastores, such as Apache Solr and Elasticsearch, have made it much easier to deploy small to medium scale web-based search engines. However, developing a sustainable search engine that supports scholarly big data services is still challenging often because of limited human resources and financial support. Such scenarios are typical in academic settings or small businesses. Here, we showcase how four key design decisions were made by trading-off competing factors such as performance, cost, and efficiency, when developing the Next Generation CiteSeerX (NGX), the successor of CiteSeerX, which was a pioneering digital library search engine that has been serving academic communities for more than two decades. This work extends our previous work in Wu et al. (2021) and discusses design considerations of infrastructure, web applications, indexing, and document filtering. These design considerations can be generalized to other web-based search engines with a similar scale that are deployed in small business or academic settings with limited resources.
可持续学术大数据服务的设计考虑
web编程技术(如Ajax和jQuery)和数据存储(如Apache Solr和Elasticsearch)的进步使得部署中小型基于web的搜索引擎变得更加容易。然而,由于人力资源和财政支持有限,开发一个支持学术大数据服务的可持续搜索引擎仍然具有挑战性。这种情况在学术环境或小企业中很常见。在这里,我们展示了在开发下一代CiteSeerX (NGX)时,如何通过权衡性能、成本和效率等竞争因素做出四个关键的设计决策。NGX是CiteSeerX的继任者,它是一个开创性的数字图书馆搜索引擎,已经为学术界服务了20多年。这项工作扩展了我们之前在Wu等人(2021)中的工作,并讨论了基础设施、web应用程序、索引和文档过滤的设计考虑。这些设计考虑可以推广到其他基于web的搜索引擎,这些搜索引擎具有类似的规模,部署在资源有限的小型企业或学术环境中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信