利用伪无关反馈改进信息检索系统的有效性

Elvina, Rila Mandala
{"title":"利用伪无关反馈改进信息检索系统的有效性","authors":"Elvina, Rila Mandala","doi":"10.1109/ICSECC51444.2020.9557550","DOIUrl":null,"url":null,"abstract":"Pseudo relevance feedback (PRF) enhances the retrieval performance of the relevance feedback. Pseudo relevance feedback assumes that the k highest-ranking documents in the first retrieval are relevant and extract query expansion from them. Rocchio algorithm is a classical algorithm for implementing relevance feedback into vector space models. The Rocchio algorithm forms a new query moves toward the centroid of the relevant documents and keeps away from centroid of the irrelevant documents. However, in the relevance feedback method, irrelevant documents are ignored. In this paper, we conduct a method for pseudo irrelevance feedback (PIRF) documents components that effectively applied to the Rocchio algorithm. Documents with a high ranking outside of k relevant documents and those documents dissimilar to any k relevant documents can extract good query expansion if the documents are applied as irrelevant documents. The Rocchio algorithm uses PRF as a component of relevant documents and this research method for irrelevant documents as a component of irrelevant documents denoted by Roc PRF PIRF (filter). Experiment on CISI dataset show that Roc PRF PIRF (filter) improved performance by testing several variations the number of irrelevant documents compared to the standard Rocchio algorithm and Rocchio algorithm with irrelevant documents but without proposed method).","PeriodicalId":302689,"journal":{"name":"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Effectiveness Information Retrieval System Using Pseudo Irrelevance Feedback\",\"authors\":\"Elvina, Rila Mandala\",\"doi\":\"10.1109/ICSECC51444.2020.9557550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pseudo relevance feedback (PRF) enhances the retrieval performance of the relevance feedback. Pseudo relevance feedback assumes that the k highest-ranking documents in the first retrieval are relevant and extract query expansion from them. Rocchio algorithm is a classical algorithm for implementing relevance feedback into vector space models. The Rocchio algorithm forms a new query moves toward the centroid of the relevant documents and keeps away from centroid of the irrelevant documents. However, in the relevance feedback method, irrelevant documents are ignored. In this paper, we conduct a method for pseudo irrelevance feedback (PIRF) documents components that effectively applied to the Rocchio algorithm. Documents with a high ranking outside of k relevant documents and those documents dissimilar to any k relevant documents can extract good query expansion if the documents are applied as irrelevant documents. The Rocchio algorithm uses PRF as a component of relevant documents and this research method for irrelevant documents as a component of irrelevant documents denoted by Roc PRF PIRF (filter). Experiment on CISI dataset show that Roc PRF PIRF (filter) improved performance by testing several variations the number of irrelevant documents compared to the standard Rocchio algorithm and Rocchio algorithm with irrelevant documents but without proposed method).\",\"PeriodicalId\":302689,\"journal\":{\"name\":\"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSECC51444.2020.9557550\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSECC51444.2020.9557550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

伪相关反馈(PRF)提高了相关反馈的检索性能。伪相关性反馈假设第一次检索中排名最高的k个文档是相关的,并从中提取查询扩展。Rocchio算法是在向量空间模型中实现相关反馈的经典算法。Rocchio算法形成一个新的查询,向相关文档的质心移动,远离无关文档的质心。然而,在相关反馈法中,不相关的文件被忽略。在本文中,我们提出了一种伪不相关反馈(PIRF)文档组件的方法,该方法有效地应用于Rocchio算法。在k个相关文档之外排名较高的文档,以及与任何k个相关文档都不相似的文档,如果将这些文档应用为不相关文档,则可以提取出良好的查询扩展。Rocchio算法使用PRF作为相关文档的组成部分,本研究方法将不相关文档作为不相关文档的组成部分,用Roc PRF (filter)表示。在CISI数据集上的实验表明,Roc PRF PIRF(滤波器)通过测试几种变化(与标准Rocchio算法和具有不相关文档但没有提出方法的Rocchio算法相比,无关文档的数量)提高了性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Effectiveness Information Retrieval System Using Pseudo Irrelevance Feedback
Pseudo relevance feedback (PRF) enhances the retrieval performance of the relevance feedback. Pseudo relevance feedback assumes that the k highest-ranking documents in the first retrieval are relevant and extract query expansion from them. Rocchio algorithm is a classical algorithm for implementing relevance feedback into vector space models. The Rocchio algorithm forms a new query moves toward the centroid of the relevant documents and keeps away from centroid of the irrelevant documents. However, in the relevance feedback method, irrelevant documents are ignored. In this paper, we conduct a method for pseudo irrelevance feedback (PIRF) documents components that effectively applied to the Rocchio algorithm. Documents with a high ranking outside of k relevant documents and those documents dissimilar to any k relevant documents can extract good query expansion if the documents are applied as irrelevant documents. The Rocchio algorithm uses PRF as a component of relevant documents and this research method for irrelevant documents as a component of irrelevant documents denoted by Roc PRF PIRF (filter). Experiment on CISI dataset show that Roc PRF PIRF (filter) improved performance by testing several variations the number of irrelevant documents compared to the standard Rocchio algorithm and Rocchio algorithm with irrelevant documents but without proposed method).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信