信息检索中性能指标的预测:一项实验研究

IF 1.1 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Malaysian Journal of Computer Science Pub Date : 2021-12-31 DOI:10.22452/mjcs.sp2021no2.3

Sinyinda Muwanei, Sri Devi Ravana, W. Hoo, D. Kunda

{"title":"信息检索中性能指标的预测:一项实验研究","authors":"Sinyinda Muwanei, Sri Devi Ravana, W. Hoo, D. Kunda","doi":"10.22452/mjcs.sp2021no2.3","DOIUrl":null,"url":null,"abstract":"Information retrieval systems are widely used by people from all walks of life to meet diverse user needs. Hence, the ability of these retrieval systems to return the relevant information in response to user queries has been a matter of concern to the information retrieval research community. To address this concern, evaluations of these retrieval systems is extremely critical and the most popular way is the approach that employs test collections. This approach has been the popular evaluation approach in information retrieval for several decades. However, one of the limitations of this evaluation approach concerns the costly creation of relevance judgments. In recent research, this limitation was addressed by predicting performance metrics at the high cut-off depths of documents by using performance metrics computed at low cut-off depths. However, the challenge the research community is faced with is how to predict the precision and the non-cumulative gain performance metrics at the high cut-off depths of documents while using other performance metrics computed at the low cut-off depths of at most 30 documents. This study addresses this challenge by investigating the predictability of performance metrics and proposing two approaches that predict the precision and the non-cumulative discounted gain performance metrics. This study has shown that there exist dataset shifts in the performance metrics computed from different test collections. Furthermore, the proposed approaches have demonstrated better results of the ranked correlations of the predictions of performance metrics than existing research.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"THE PREDICTIONS OF PERFORMANCE METRICS IN INFORMATION RETRIEVAL: AN EXPERIMENTAL STUDY\",\"authors\":\"Sinyinda Muwanei, Sri Devi Ravana, W. Hoo, D. Kunda\",\"doi\":\"10.22452/mjcs.sp2021no2.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information retrieval systems are widely used by people from all walks of life to meet diverse user needs. Hence, the ability of these retrieval systems to return the relevant information in response to user queries has been a matter of concern to the information retrieval research community. To address this concern, evaluations of these retrieval systems is extremely critical and the most popular way is the approach that employs test collections. This approach has been the popular evaluation approach in information retrieval for several decades. However, one of the limitations of this evaluation approach concerns the costly creation of relevance judgments. In recent research, this limitation was addressed by predicting performance metrics at the high cut-off depths of documents by using performance metrics computed at low cut-off depths. However, the challenge the research community is faced with is how to predict the precision and the non-cumulative gain performance metrics at the high cut-off depths of documents while using other performance metrics computed at the low cut-off depths of at most 30 documents. This study addresses this challenge by investigating the predictability of performance metrics and proposing two approaches that predict the precision and the non-cumulative discounted gain performance metrics. This study has shown that there exist dataset shifts in the performance metrics computed from different test collections. Furthermore, the proposed approaches have demonstrated better results of the ranked correlations of the predictions of performance metrics than existing research.\",\"PeriodicalId\":49894,\"journal\":{\"name\":\"Malaysian Journal of Computer Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2021-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Malaysian Journal of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.22452/mjcs.sp2021no2.3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.sp2021no2.3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

信息检索系统被各行各业的人们广泛使用，以满足不同的用户需求。因此，这些检索系统响应用户查询返回相关信息的能力一直是信息检索研究界关注的问题。为了解决这个问题，对这些检索系统的评估是非常关键的，最流行的方法是使用测试集合的方法。几十年来，这种方法一直是信息检索中流行的评估方法。然而，这种评估方法的局限性之一涉及创建相关性判断的成本高昂。在最近的研究中，通过使用在低截止深度计算的性能指标来预测文档的高截止深度处的性能指标，解决了这一限制。然而，研究界面临的挑战是，如何在文档的高截止深度下预测精度和非累积增益性能指标，同时使用在最多30个文档的低截止深度下计算的其他性能指标。本研究通过研究性能指标的可预测性，并提出两种预测精度和非累积折扣增益性能指标的方法来应对这一挑战。这项研究表明，从不同的测试集合计算的性能指标存在数据集变化。此外，所提出的方法已经证明了比现有研究更好的性能指标预测的排序相关性结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

THE PREDICTIONS OF PERFORMANCE METRICS IN INFORMATION RETRIEVAL: AN EXPERIMENTAL STUDY

Information retrieval systems are widely used by people from all walks of life to meet diverse user needs. Hence, the ability of these retrieval systems to return the relevant information in response to user queries has been a matter of concern to the information retrieval research community. To address this concern, evaluations of these retrieval systems is extremely critical and the most popular way is the approach that employs test collections. This approach has been the popular evaluation approach in information retrieval for several decades. However, one of the limitations of this evaluation approach concerns the costly creation of relevance judgments. In recent research, this limitation was addressed by predicting performance metrics at the high cut-off depths of documents by using performance metrics computed at low cut-off depths. However, the challenge the research community is faced with is how to predict the precision and the non-cumulative gain performance metrics at the high cut-off depths of documents while using other performance metrics computed at the low cut-off depths of at most 30 documents. This study addresses this challenge by investigating the predictability of performance metrics and proposing two approaches that predict the precision and the non-cumulative discounted gain performance metrics. This study has shown that there exist dataset shifts in the performance metrics computed from different test collections. Furthermore, the proposed approaches have demonstrated better results of the ranked correlations of the predictions of performance metrics than existing research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Malaysian Journal of Computer Science COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

2.20

自引率

33.30%

发文量

审稿时长

7.5 months

期刊介绍： The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus