Measuring the retrievability of digital library content using analytics data

IF 4.3 2区管理学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the Association for Information Science and Technology Pub Date : 2024-03-19 DOI:10.1002/asi.24886

Hamed Jahani, Leif Azzopardi, Mark Sanderson

{"title":"Measuring the retrievability of digital library content using analytics data","authors":"Hamed Jahani, Leif Azzopardi, Mark Sanderson","doi":"10.1002/asi.24886","DOIUrl":null,"url":null,"abstract":"<p>Digital libraries aim to provide value to users by housing content that is accessible and searchable. Often such access is afforded through external web search engines. In this article, we measure how easily digital library content can be retrieved (i.e., how retrievable) through a well-known search engine (Google) using its analytics platforms. Using two measures of document retrievability, we contrast our results with simulation-based studies that employed synthetic query sets. We determine that estimating the retrievability of content given a Digital Library index is not a strong predictor of how retrievable the content is in practice (via external search engines). Retrievability established the notion that search algorithms can be biased. In our work, we find that while there such bias is present, much of the variation in retrievability appears to be strongly influenced by the queries submitted to the library, a side of retrievability less examined in past work.</p>","PeriodicalId":48810,"journal":{"name":"Journal of the Association for Information Science and Technology","volume":"75 11","pages":"1233-1248"},"PeriodicalIF":4.3000,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/asi.24886","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Association for Information Science and Technology","FirstCategoryId":"91","ListUrlMain":"https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24886","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Digital libraries aim to provide value to users by housing content that is accessible and searchable. Often such access is afforded through external web search engines. In this article, we measure how easily digital library content can be retrieved (i.e., how retrievable) through a well-known search engine (Google) using its analytics platforms. Using two measures of document retrievability, we contrast our results with simulation-based studies that employed synthetic query sets. We determine that estimating the retrievability of content given a Digital Library index is not a strong predictor of how retrievable the content is in practice (via external search engines). Retrievability established the notion that search algorithms can be biased. In our work, we find that while there such bias is present, much of the variation in retrievability appears to be strongly influenced by the queries submitted to the library, a side of retrievability less examined in past work.

Abstract Image

查看原文本刊更多论文

利用分析数据测试数字图书馆内容的可检索性

数字图书馆旨在通过收藏可访问和可搜索的内容为用户提供价值。这种访问通常是通过外部网络搜索引擎实现的。在本文中，我们利用知名搜索引擎（谷歌）的分析平台来衡量数字图书馆内容的检索难易程度（即可检索性）。我们使用两种文档检索度量方法，将我们的结果与使用合成查询集的模拟研究结果进行对比。我们确定，根据数字图书馆索引估计内容的可检索性并不能有力地预测内容的实际可检索性（通过外部搜索引擎）。可检索性确立了搜索算法可能存在偏差的概念。在我们的工作中，我们发现虽然存在这种偏差，但可检索性的大部分变化似乎受到提交给图书馆的查询的强烈影响，而这是过去的工作中较少研究的可检索性的一个方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the Association for Information Science and Technology COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

8.30

自引率

8.60%

发文量

115

期刊介绍： The Journal of the Association for Information Science and Technology (JASIST) is a leading international forum for peer-reviewed research in information science. For more than half a century, JASIST has provided intellectual leadership by publishing original research that focuses on the production, discovery, recording, storage, representation, retrieval, presentation, manipulation, dissemination, use, and evaluation of information and on the tools and techniques associated with these processes. The Journal welcomes rigorous work of an empirical, experimental, ethnographic, conceptual, historical, socio-technical, policy-analytic, or critical-theoretical nature. JASIST also commissions in-depth review articles (“Advances in Information Science”) and reviews of print and other media.