相关性相似度:监控信息检索系统的另一种方法。

Biomedical digital libraries Pub Date : 2005-07-20 DOI:10.1186/1742-5581-2-6

Peng Dong, Marie Loh, Adrian Mondry

{"title":"相关性相似度:监控信息检索系统的另一种方法。","authors":"Peng Dong, Marie Loh, Adrian Mondry","doi":"10.1186/1742-5581-2-6","DOIUrl":null,"url":null,"abstract":"Background: Relevance assessment is a major problem in the evaluation of information retrieval systems. The work presented here introduces a new parameter, \"Relevance Similarity\", for the measurement of the variation of relevance assessment. In a situation where individual assessment can be compared with a gold standard, this parameter is used to study the effect of such variation on the performance of a medical information retrieval system. In such a setting, Relevance Similarity is the ratio of assessors who rank a given document same as the gold standard over the total number of assessors in the group.Methods: The study was carried out on a collection of Critically Appraised Topics (CATs). Twelve volunteers were divided into two groups of people according to their domain knowledge. They assessed the relevance of retrieved topics obtained by querying a meta-search engine with ten keywords related to medical science. Their assessments were compared to the gold standard assessment, and Relevance Similarities were calculated as the ratio of positive concordance with the gold standard for each topic.Results: The similarity comparison among groups showed that a higher degree of agreements exists among evaluators with more subject knowledge. The performance of the retrieval system was not significantly different as a result of the variations in relevance assessment in this particular query set.Conclusion: In assessment situations where evaluators can be compared to a gold standard, Relevance Similarity provides an alternative evaluation technique to the commonly used kappa scores, which may give paradoxically low scores in highly biased situations such as document repositories containing large quantities of relevant data.","PeriodicalId":87058,"journal":{"name":"Biomedical digital libraries","volume":"2 ","pages":"6"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1742-5581-2-6","citationCount":"9","resultStr":"{\"title\":\"Relevance similarity: an alternative means to monitor information retrieval systems.\",\"authors\":\"Peng Dong, Marie Loh, Adrian Mondry\",\"doi\":\"10.1186/1742-5581-2-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Relevance assessment is a major problem in the evaluation of information retrieval systems. The work presented here introduces a new parameter, \\\"Relevance Similarity\\\", for the measurement of the variation of relevance assessment. In a situation where individual assessment can be compared with a gold standard, this parameter is used to study the effect of such variation on the performance of a medical information retrieval system. In such a setting, Relevance Similarity is the ratio of assessors who rank a given document same as the gold standard over the total number of assessors in the group.Methods: The study was carried out on a collection of Critically Appraised Topics (CATs). Twelve volunteers were divided into two groups of people according to their domain knowledge. They assessed the relevance of retrieved topics obtained by querying a meta-search engine with ten keywords related to medical science. Their assessments were compared to the gold standard assessment, and Relevance Similarities were calculated as the ratio of positive concordance with the gold standard for each topic.Results: The similarity comparison among groups showed that a higher degree of agreements exists among evaluators with more subject knowledge. The performance of the retrieval system was not significantly different as a result of the variations in relevance assessment in this particular query set.Conclusion: In assessment situations where evaluators can be compared to a gold standard, Relevance Similarity provides an alternative evaluation technique to the commonly used kappa scores, which may give paradoxically low scores in highly biased situations such as document repositories containing large quantities of relevant data.\",\"PeriodicalId\":87058,\"journal\":{\"name\":\"Biomedical digital libraries\",\"volume\":\"2 \",\"pages\":\"6\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1186/1742-5581-2-6\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical digital libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/1742-5581-2-6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical digital libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1742-5581-2-6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

背景:相关性评价是信息检索系统评价中的一个主要问题。本文引入了一个新的参数“相关性相似度”，用于度量相关性评估的变化。在可以将个人评估与金标准进行比较的情况下，该参数用于研究这种差异对医疗信息检索系统性能的影响。在这样的设置中，相关性相似度是将给定文档与金标准排序相同的评估员与组中评估员总数的比率。方法:本研究采用一系列批判性评价主题(CATs)进行。12名志愿者根据他们的领域知识被分成两组。他们通过查询一个包含10个与医学科学相关的关键词的元搜索引擎来评估检索主题的相关性。将他们的评估与金标准评估进行比较，并计算相关相似度，作为每个主题与金标准的积极一致性的比率。结果:组间相似度比较显示，学科知识越丰富的评价者对评价结果的认同程度越高。在这个特定的查询集中，由于相关性评估的变化，检索系统的性能没有显着差异。结论:在评估人员可以与金标准进行比较的评估情况下，相关性相似性提供了一种替代常用kappa分数的评估技术，在高度偏倚的情况下，如包含大量相关数据的文档存储库，kappa分数可能会自相矛盾地低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Relevance similarity: an alternative means to monitor information retrieval systems.

查看原文本刊更多论文

Relevance similarity: an alternative means to monitor information retrieval systems.

Background: Relevance assessment is a major problem in the evaluation of information retrieval systems. The work presented here introduces a new parameter, "Relevance Similarity", for the measurement of the variation of relevance assessment. In a situation where individual assessment can be compared with a gold standard, this parameter is used to study the effect of such variation on the performance of a medical information retrieval system. In such a setting, Relevance Similarity is the ratio of assessors who rank a given document same as the gold standard over the total number of assessors in the group.

Methods: The study was carried out on a collection of Critically Appraised Topics (CATs). Twelve volunteers were divided into two groups of people according to their domain knowledge. They assessed the relevance of retrieved topics obtained by querying a meta-search engine with ten keywords related to medical science. Their assessments were compared to the gold standard assessment, and Relevance Similarities were calculated as the ratio of positive concordance with the gold standard for each topic.

Results: The similarity comparison among groups showed that a higher degree of agreements exists among evaluators with more subject knowledge. The performance of the retrieval system was not significantly different as a result of the variations in relevance assessment in this particular query set.

Conclusion: In assessment situations where evaluators can be compared to a gold standard, Relevance Similarity provides an alternative evaluation technique to the commonly used kappa scores, which may give paradoxically low scores in highly biased situations such as document repositories containing large quantities of relevant data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical digital libraries

自引率

0.00%

发文量