段落感知搜索结果多样化

IF 5.4 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Zhan Su, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen
{"title":"段落感知搜索结果多样化","authors":"Zhan Su, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen","doi":"10.1145/3653672","DOIUrl":null,"url":null,"abstract":"<p>Research on search result diversification strives to enhance the variety of subtopics within the list of search results. Existing studies usually treat a document as a whole and represent it with one fixed-length vector. However, considering that a long document could cover different aspects of a query, using a single vector to represent the document is usually insufficient. To tackle this problem, we propose to exploit multiple passages to better represent documents in search result diversification. Different passages of each document may reflect different subtopics of the query and comparison among the passages can improve result diversity. Specifically, we segment the entire document into multiple passages and train a classifier to filter out the irrelevant ones. Then the document diversity is measured based on several passages that can offer the information needs of the query. Thereafter, we devise a passage-aware search result diversification framework that takes into account the topic information contained in the selected document sequence and candidate documents. The candidate documents’ novelty is evaluated based on their passages while considering the dynamically selected document sequence. We conducted experiments on a commonly utilized dataset, and the results indicate that our proposed method performs better than the most leading methods.</p>","PeriodicalId":50936,"journal":{"name":"ACM Transactions on Information Systems","volume":null,"pages":null},"PeriodicalIF":5.4000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Passage-aware Search Result Diversification\",\"authors\":\"Zhan Su, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen\",\"doi\":\"10.1145/3653672\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Research on search result diversification strives to enhance the variety of subtopics within the list of search results. Existing studies usually treat a document as a whole and represent it with one fixed-length vector. However, considering that a long document could cover different aspects of a query, using a single vector to represent the document is usually insufficient. To tackle this problem, we propose to exploit multiple passages to better represent documents in search result diversification. Different passages of each document may reflect different subtopics of the query and comparison among the passages can improve result diversity. Specifically, we segment the entire document into multiple passages and train a classifier to filter out the irrelevant ones. Then the document diversity is measured based on several passages that can offer the information needs of the query. Thereafter, we devise a passage-aware search result diversification framework that takes into account the topic information contained in the selected document sequence and candidate documents. The candidate documents’ novelty is evaluated based on their passages while considering the dynamically selected document sequence. We conducted experiments on a commonly utilized dataset, and the results indicate that our proposed method performs better than the most leading methods.</p>\",\"PeriodicalId\":50936,\"journal\":{\"name\":\"ACM Transactions on Information Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2024-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Information Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3653672\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Information Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3653672","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

有关搜索结果多样化的研究致力于提高搜索结果列表中子主题的多样性。现有研究通常将文档视为一个整体,用一个固定长度的向量来表示。然而,考虑到一篇长文档可能涵盖查询的不同方面,使用单一向量来表示文档通常是不够的。为了解决这个问题,我们建议在搜索结果多样化时利用多个段落来更好地表示文档。每个文档的不同段落可能反映查询的不同子主题,而段落之间的比较可以提高搜索结果的多样性。具体来说,我们将整个文档分割成多个段落,并训练分类器来过滤掉不相关的段落。然后,根据能满足查询信息需求的多个段落来衡量文档的多样性。之后,我们设计了一个段落感知搜索结果多样化框架,该框架考虑了所选文档序列和候选文档中包含的主题信息。在考虑动态选择文档序列的同时,根据候选文档的段落对其新颖性进行评估。我们在一个常用的数据集上进行了实验,结果表明我们提出的方法比最主要的方法性能更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Passage-aware Search Result Diversification

Research on search result diversification strives to enhance the variety of subtopics within the list of search results. Existing studies usually treat a document as a whole and represent it with one fixed-length vector. However, considering that a long document could cover different aspects of a query, using a single vector to represent the document is usually insufficient. To tackle this problem, we propose to exploit multiple passages to better represent documents in search result diversification. Different passages of each document may reflect different subtopics of the query and comparison among the passages can improve result diversity. Specifically, we segment the entire document into multiple passages and train a classifier to filter out the irrelevant ones. Then the document diversity is measured based on several passages that can offer the information needs of the query. Thereafter, we devise a passage-aware search result diversification framework that takes into account the topic information contained in the selected document sequence and candidate documents. The candidate documents’ novelty is evaluated based on their passages while considering the dynamically selected document sequence. We conducted experiments on a commonly utilized dataset, and the results indicate that our proposed method performs better than the most leading methods.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ACM Transactions on Information Systems
ACM Transactions on Information Systems 工程技术-计算机:信息系统
CiteScore
9.40
自引率
14.30%
发文量
165
审稿时长
>12 weeks
期刊介绍: The ACM Transactions on Information Systems (TOIS) publishes papers on information retrieval (such as search engines, recommender systems) that contain: new principled information retrieval models or algorithms with sound empirical validation; observational, experimental and/or theoretical studies yielding new insights into information retrieval or information seeking; accounts of applications of existing information retrieval techniques that shed light on the strengths and weaknesses of the techniques; formalization of new information retrieval or information seeking tasks and of methods for evaluating the performance on those tasks; development of content (text, image, speech, video, etc) analysis methods to support information retrieval and information seeking; development of computational models of user information preferences and interaction behaviors; creation and analysis of evaluation methodologies for information retrieval and information seeking; or surveys of existing work that propose a significant synthesis. The information retrieval scope of ACM Transactions on Information Systems (TOIS) appeals to industry practitioners for its wealth of creative ideas, and to academic researchers for its descriptions of their colleagues'' work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信