Pseudo-Relevance Feedback Driven for XML Query Expansion

Minjuan Zhong, Changxuan Wan
{"title":"Pseudo-Relevance Feedback Driven for XML Query Expansion","authors":"Minjuan Zhong, Changxuan Wan","doi":"10.4156/JCIT.VOL5.ISSUE9.15","DOIUrl":null,"url":null,"abstract":"Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naive document similarity measurement is proposed in this paper. Based on this, kmedian clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user’s intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.","PeriodicalId":360193,"journal":{"name":"J. Convergence Inf. Technol.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Convergence Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4156/JCIT.VOL5.ISSUE9.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Pseudo-relevance feedback has been perceived as an effective solution for automatic query expansion. However, a recent study has shown that traditional pseudo-relevance feedback may bring into topic drift and hence be harmful to the retrieval performance. It is often crucial to identify those good feedback documents from which useful expansion terms can be added to the query. Compared with traditional query expansion, XML query expansion needs not only content expansion but also considering structural expansion. This paper presents a solution for both identifying related documents and selecting good expansion information with new content and path constrains. Combined with XML semantic feature, a naive document similarity measurement is proposed in this paper. Based on this, kmedian clustering algorithm is firstly implemented and some related documents are found. Secondly, query expansion is only performed by two steps in the set of related documents, which key phrase extraction algorithm is carried out to expand original query in the first step and the second step is structural expansion based on the expanded key phrases. Finally a full-edged content-structure query expression which can represent user’s intention is formalized. Experimental results on IEEE CS collection show that the proposed method can reduce the topic drift effectively and obtain the better retrieval quality.
伪相关反馈驱动的XML查询扩展
伪相关反馈被认为是自动查询扩展的有效解决方案。然而,最近的研究表明,传统的伪相关反馈可能会导致主题漂移,从而影响检索性能。识别那些可以将有用的扩展术语添加到查询中的好的反馈文档通常是至关重要的。与传统的查询扩展相比,XML查询扩展不仅需要内容扩展,还需要考虑结构扩展。本文提出了一种识别相关文档和选择具有新内容和路径约束的良好扩展信息的解决方案。结合XML语义特征,提出了一种朴素的文档相似度度量方法。在此基础上,首先实现了kmedian聚类算法,并找到了相关文献。其次,在相关文档集中仅分两步进行查询扩展,第一步执行关键短语提取算法对原始查询进行扩展,第二步基于扩展后的关键短语进行结构化扩展。最后形式化了一个能代表用户意图的全边内容结构查询表达式。在IEEE CS数据集上的实验结果表明,该方法可以有效地减少主题漂移,获得较好的检索质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信