Sinyinda Muwanei, Sri Devi Ravana, W. Hoo, D. Kunda
{"title":"信息检索中性能指标的预测:一项实验研究","authors":"Sinyinda Muwanei, Sri Devi Ravana, W. Hoo, D. Kunda","doi":"10.22452/mjcs.sp2021no2.3","DOIUrl":null,"url":null,"abstract":"Information retrieval systems are widely used by people from all walks of life to meet diverse user needs. Hence, the ability of these retrieval systems to return the relevant information in response to user queries has been a matter of concern to the information retrieval research community. To address this concern, evaluations of these retrieval systems is extremely critical and the most popular way is the approach that employs test collections. This approach has been the popular evaluation approach in information retrieval for several decades. However, one of the limitations of this evaluation approach concerns the costly creation of relevance judgments. In recent research, this limitation was addressed by predicting performance metrics at the high cut-off depths of documents by using performance metrics computed at low cut-off depths. However, the challenge the research community is faced with is how to predict the precision and the non-cumulative gain performance metrics at the high cut-off depths of documents while using other performance metrics computed at the low cut-off depths of at most 30 documents. This study addresses this challenge by investigating the predictability of performance metrics and proposing two approaches that predict the precision and the non-cumulative discounted gain performance metrics. This study has shown that there exist dataset shifts in the performance metrics computed from different test collections. Furthermore, the proposed approaches have demonstrated better results of the ranked correlations of the predictions of performance metrics than existing research.","PeriodicalId":49894,"journal":{"name":"Malaysian Journal of Computer Science","volume":" ","pages":""},"PeriodicalIF":1.1000,"publicationDate":"2021-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"THE PREDICTIONS OF PERFORMANCE METRICS IN INFORMATION RETRIEVAL: AN EXPERIMENTAL STUDY\",\"authors\":\"Sinyinda Muwanei, Sri Devi Ravana, W. Hoo, D. Kunda\",\"doi\":\"10.22452/mjcs.sp2021no2.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information retrieval systems are widely used by people from all walks of life to meet diverse user needs. Hence, the ability of these retrieval systems to return the relevant information in response to user queries has been a matter of concern to the information retrieval research community. To address this concern, evaluations of these retrieval systems is extremely critical and the most popular way is the approach that employs test collections. This approach has been the popular evaluation approach in information retrieval for several decades. However, one of the limitations of this evaluation approach concerns the costly creation of relevance judgments. In recent research, this limitation was addressed by predicting performance metrics at the high cut-off depths of documents by using performance metrics computed at low cut-off depths. However, the challenge the research community is faced with is how to predict the precision and the non-cumulative gain performance metrics at the high cut-off depths of documents while using other performance metrics computed at the low cut-off depths of at most 30 documents. This study addresses this challenge by investigating the predictability of performance metrics and proposing two approaches that predict the precision and the non-cumulative discounted gain performance metrics. This study has shown that there exist dataset shifts in the performance metrics computed from different test collections. Furthermore, the proposed approaches have demonstrated better results of the ranked correlations of the predictions of performance metrics than existing research.\",\"PeriodicalId\":49894,\"journal\":{\"name\":\"Malaysian Journal of Computer Science\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.1000,\"publicationDate\":\"2021-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Malaysian Journal of Computer Science\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.22452/mjcs.sp2021no2.3\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Malaysian Journal of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.22452/mjcs.sp2021no2.3","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
THE PREDICTIONS OF PERFORMANCE METRICS IN INFORMATION RETRIEVAL: AN EXPERIMENTAL STUDY
Information retrieval systems are widely used by people from all walks of life to meet diverse user needs. Hence, the ability of these retrieval systems to return the relevant information in response to user queries has been a matter of concern to the information retrieval research community. To address this concern, evaluations of these retrieval systems is extremely critical and the most popular way is the approach that employs test collections. This approach has been the popular evaluation approach in information retrieval for several decades. However, one of the limitations of this evaluation approach concerns the costly creation of relevance judgments. In recent research, this limitation was addressed by predicting performance metrics at the high cut-off depths of documents by using performance metrics computed at low cut-off depths. However, the challenge the research community is faced with is how to predict the precision and the non-cumulative gain performance metrics at the high cut-off depths of documents while using other performance metrics computed at the low cut-off depths of at most 30 documents. This study addresses this challenge by investigating the predictability of performance metrics and proposing two approaches that predict the precision and the non-cumulative discounted gain performance metrics. This study has shown that there exist dataset shifts in the performance metrics computed from different test collections. Furthermore, the proposed approaches have demonstrated better results of the ranked correlations of the predictions of performance metrics than existing research.
期刊介绍:
The Malaysian Journal of Computer Science (ISSN 0127-9084) is published four times a year in January, April, July and October by the Faculty of Computer Science and Information Technology, University of Malaya, since 1985. Over the years, the journal has gained popularity and the number of paper submissions has increased steadily. The rigorous reviews from the referees have helped in ensuring that the high standard of the journal is maintained. The objectives are to promote exchange of information and knowledge in research work, new inventions/developments of Computer Science and on the use of Information Technology towards the structuring of an information-rich society and to assist the academic staff from local and foreign universities, business and industrial sectors, government departments and academic institutions on publishing research results and studies in Computer Science and Information Technology through a scholarly publication. The journal is being indexed and abstracted by Clarivate Analytics'' Web of Science and Elsevier''s Scopus