基于项关系的萤火虫算法改进伪相关反馈

Muhammad Fikri Hasani, Rila Mandala
{"title":"基于项关系的萤火虫算法改进伪相关反馈","authors":"Muhammad Fikri Hasani, Rila Mandala","doi":"10.1109/ICSECC51444.2020.9557560","DOIUrl":null,"url":null,"abstract":"When searching for information with an information retrieval (IR) system, sometimes the results of the search documents provided by the system do not match the information needs of the user. Pseudo Relevance Feedback (PRF) based Query expansion (QE) tries to overcome these problems by adding words that are expected to improve retrieval results from top N ranked documents retrieved. The use of firefly algorithm (FA) as one of the optimization methods has been proven by the previous study to improve the performance of the IR system. However, in that study the weighting of words was done using the rocchio function of the Pseudo Relevant Document (PRD), so it is feared that the performance of IR system will be reduced if the number of relevant documents in PRD is little or none at all. Therefore, scoring by term relationship between query and PRD is used in this study combined with rocchio algorithm. The results of the study showed that usage of term relationship word co-occurrence or word similarity can improve the performance of the IRS that was previously built. In addition, word co-occurrence with jaccard have the best performance compared to the previous study and other combinations. FA itself was able to choose the optimal terms, even though the number of top N ranked documents increased. Furthermore, the combination of term relationship and rocchio algorithm can increase the convergence rate than the ones without rocchio algorithm.","PeriodicalId":302689,"journal":{"name":"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Pseudo Relevance Feedback with Term Relationship using Firefly Algorithm\",\"authors\":\"Muhammad Fikri Hasani, Rila Mandala\",\"doi\":\"10.1109/ICSECC51444.2020.9557560\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When searching for information with an information retrieval (IR) system, sometimes the results of the search documents provided by the system do not match the information needs of the user. Pseudo Relevance Feedback (PRF) based Query expansion (QE) tries to overcome these problems by adding words that are expected to improve retrieval results from top N ranked documents retrieved. The use of firefly algorithm (FA) as one of the optimization methods has been proven by the previous study to improve the performance of the IR system. However, in that study the weighting of words was done using the rocchio function of the Pseudo Relevant Document (PRD), so it is feared that the performance of IR system will be reduced if the number of relevant documents in PRD is little or none at all. Therefore, scoring by term relationship between query and PRD is used in this study combined with rocchio algorithm. The results of the study showed that usage of term relationship word co-occurrence or word similarity can improve the performance of the IRS that was previously built. In addition, word co-occurrence with jaccard have the best performance compared to the previous study and other combinations. FA itself was able to choose the optimal terms, even though the number of top N ranked documents increased. Furthermore, the combination of term relationship and rocchio algorithm can increase the convergence rate than the ones without rocchio algorithm.\",\"PeriodicalId\":302689,\"journal\":{\"name\":\"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)\",\"volume\":\"219 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSECC51444.2020.9557560\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Sustainable Engineering and Creative Computing (ICSECC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSECC51444.2020.9557560","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在利用信息检索(information retrieval, IR)系统进行信息检索时,有时系统提供的检索文档的结果与用户的信息需求不匹配。基于伪相关反馈(PRF)的查询扩展(QE)试图克服这些问题,方法是在检索到的排名前N的文档中添加有望改善检索结果的单词。利用萤火虫算法(FA)作为优化方法之一,已被前人的研究证明可以提高红外系统的性能。然而,在该研究中,单词的权重是使用伪相关文档(PRD)的rocchio函数来完成的,因此,如果PRD中的相关文档数量很少或根本没有,则担心IR系统的性能会降低。因此,本研究结合rocchio算法,采用查询与PRD之间的术语关系打分。研究结果表明,使用术语关系、词共现或词相似可以提高先前构建的IRS的性能。此外,单词共现与jaccard的组合相比,其表现最好。FA本身能够选择最优的术语,即使排名前N的文档数量增加了。此外,术语关系与rocchio算法相结合比不使用rocchio算法的收敛速度更快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improving Pseudo Relevance Feedback with Term Relationship using Firefly Algorithm
When searching for information with an information retrieval (IR) system, sometimes the results of the search documents provided by the system do not match the information needs of the user. Pseudo Relevance Feedback (PRF) based Query expansion (QE) tries to overcome these problems by adding words that are expected to improve retrieval results from top N ranked documents retrieved. The use of firefly algorithm (FA) as one of the optimization methods has been proven by the previous study to improve the performance of the IR system. However, in that study the weighting of words was done using the rocchio function of the Pseudo Relevant Document (PRD), so it is feared that the performance of IR system will be reduced if the number of relevant documents in PRD is little or none at all. Therefore, scoring by term relationship between query and PRD is used in this study combined with rocchio algorithm. The results of the study showed that usage of term relationship word co-occurrence or word similarity can improve the performance of the IRS that was previously built. In addition, word co-occurrence with jaccard have the best performance compared to the previous study and other combinations. FA itself was able to choose the optimal terms, even though the number of top N ranked documents increased. Furthermore, the combination of term relationship and rocchio algorithm can increase the convergence rate than the ones without rocchio algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信