利用基于方面的查询扩展来多样化搜索结果

Shajalal, Muhammad Anwarul Azim Masaki Aono
{"title":"利用基于方面的查询扩展来多样化搜索结果","authors":"Shajalal, Muhammad Anwarul Azim Masaki Aono","doi":"10.17781/P002433","DOIUrl":null,"url":null,"abstract":"Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.","PeriodicalId":211757,"journal":{"name":"International journal of new computer architectures and their applications","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diversifying Search Result Leveraging Aspect-based Query Expansion\",\"authors\":\"Shajalal, Muhammad Anwarul Azim Masaki Aono\",\"doi\":\"10.17781/P002433\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.\",\"PeriodicalId\":211757,\"journal\":{\"name\":\"International journal of new computer architectures and their applications\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of new computer architectures and their applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17781/P002433\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of new computer architectures and their applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17781/P002433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

Web搜索查询简短、不明确,而且往往有多种潜在的解释。为了重新表述这样的查询,查询扩展是一种重要的方法,它导致检索一组相关文档。本文提出了一种基于方面的多元文档检索扩展技术。首先,查询建议和补全是从主要的商业搜索引擎中检索的。然后应用基于频繁短语的软聚类算法对相似的检索候选词进行聚类。每个集群代表不同的查询方面。从每个集群生成的集群标签中选择扩展项。为了估计扩展查询与文档之间的相关性,分别使用内容信息和词嵌入模型引入了多个新的词汇和语义特征。最后,采用线性排序方法,使用提取的特征对原始查询检索到的文档重新排序。我们使用TREC 2012 Web Track查询对Clueweb09文档收集进行了实验。实验结果清楚地表明,我们提出的基于方面的查询扩展方法在检索文档的多样性方面是有效的,并且在多样性指标ERR-IA, α-nDCG和NRBP方面优于基线和一些已知的相关方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diversifying Search Result Leveraging Aspect-based Query Expansion
Web search queries are short, ambiguous and tend to have multiple underlying interpretations. To reformulate such queries, query expansion is a prominent method that leads to retrieve a set of relevant documents. In this paper, we propose an aspectbased query expansion technique for diversified document retrieval. At first, query suggestions and completions are retrieved from major commercial search engines. A frequent phrase-based soft clustering algorithm is then applied to group similar retrieved candidates into clusters. Each cluster represents different query aspect. The expansion terms are selected from the generated cluster labels for each cluster. To estimate the relevancy between the expanded query and the documents, multiple new lexical and semantic features are introduced using the content information, and word-embedding model, respectively. Finally, a linear ranking approach is employed to re-rank the documents retrieved for the original query using the extracted features. We conduct experiments on Clueweb09 document collection using TREC 2012 Web Track queries. The experimental results clearly demonstrate that our proposed aspect-based query expansion method is effective to diversify the retrieved documents and outperformed baseline and some known related methods in terms of diversity metrics ERR-IA, α-nDCG and NRBP at the cut of 20.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信