From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing

Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, E. Learned-Miller, J. Kamps
{"title":"From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing","authors":"Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, E. Learned-Miller, J. Kamps","doi":"10.1145/3269206.3271800","DOIUrl":null,"url":null,"abstract":"The availability of massive data and computing power allowing for effective data driven neural approaches is having a major impact on machine learning and information retrieval research, but these models have a basic problem with efficiency. Current neural ranking models are implemented as multistage rankers: for efficiency reasons, the neural model only re-ranks the top ranked documents retrieved by a first-stage efficient ranker in response to a given query. Neural ranking models learn dense representations causing essentially every query term to match every document term, making it highly inefficient or intractable to rank the whole collection. The reliance on a first stage ranker creates a dual problem: First, the interaction and combination effects are not well understood. Second, the first stage ranker serves as a \"gate-keeper\" or filter, effectively blocking the potential of neural models to uncover new relevant documents. In this work, we propose a standalone neural ranking model (SNRM) by introducing a sparsity property to learn a latent sparse representation for each query and document. This representation captures the semantic relationship between the query and documents, but is also sparse enough to enable constructing an inverted index for the whole collection. We parameterize the sparsity of the model to yield a retrieval model as efficient as conventional term based models. Our model gains in efficiency without loss of effectiveness: it not only outperforms the existing term matching baselines, but also performs similarly to the recent re-ranking based neural models with dense representations. Our model can also take advantage of pseudo-relevance feedback for further improvements. More generally, our results demonstrate the importance of sparsity in neural IR models and show that dense representations can be pruned effectively, giving new insights about essential semantic features and their distributions.","PeriodicalId":331886,"journal":{"name":"Proceedings of the 27th ACM International Conference on Information and Knowledge Management","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"153","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3269206.3271800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 153

Abstract

The availability of massive data and computing power allowing for effective data driven neural approaches is having a major impact on machine learning and information retrieval research, but these models have a basic problem with efficiency. Current neural ranking models are implemented as multistage rankers: for efficiency reasons, the neural model only re-ranks the top ranked documents retrieved by a first-stage efficient ranker in response to a given query. Neural ranking models learn dense representations causing essentially every query term to match every document term, making it highly inefficient or intractable to rank the whole collection. The reliance on a first stage ranker creates a dual problem: First, the interaction and combination effects are not well understood. Second, the first stage ranker serves as a "gate-keeper" or filter, effectively blocking the potential of neural models to uncover new relevant documents. In this work, we propose a standalone neural ranking model (SNRM) by introducing a sparsity property to learn a latent sparse representation for each query and document. This representation captures the semantic relationship between the query and documents, but is also sparse enough to enable constructing an inverted index for the whole collection. We parameterize the sparsity of the model to yield a retrieval model as efficient as conventional term based models. Our model gains in efficiency without loss of effectiveness: it not only outperforms the existing term matching baselines, but also performs similarly to the recent re-ranking based neural models with dense representations. Our model can also take advantage of pseudo-relevance feedback for further improvements. More generally, our results demonstrate the importance of sparsity in neural IR models and show that dense representations can be pruned effectively, giving new insights about essential semantic features and their distributions.
从神经重排序到神经排序:学习倒排索引的稀疏表示
大量数据的可用性和计算能力使得有效的数据驱动神经方法对机器学习和信息检索研究产生了重大影响,但这些模型在效率方面存在一个基本问题。当前的神经排序模型是作为多阶段排序器实现的:出于效率原因,神经模型只对响应给定查询的第一阶段高效排序器检索到的排名靠前的文档重新排序。神经排序模型学习密集表示,导致基本上每个查询项都匹配每个文档项,使得对整个集合进行排序非常低效或难以处理。对第一阶段排名的依赖产生了双重问题:首先,相互作用和组合效应没有得到很好的理解。第二,第一阶段rank作为“看门人”或过滤器,有效地阻止神经模型发现新的相关文档的潜力。在这项工作中,我们提出了一个独立的神经排序模型(SNRM),通过引入稀疏性来学习每个查询和文档的潜在稀疏表示。这种表示捕获了查询和文档之间的语义关系,但也足够稀疏,可以为整个集合构造一个倒排索引。我们将模型的稀疏度参数化,以产生与传统的基于项的模型一样有效的检索模型。我们的模型在没有损失有效性的情况下提高了效率:它不仅优于现有的术语匹配基线,而且与最近基于密集表示的重新排名的神经模型相似。我们的模型还可以利用伪相关反馈进行进一步改进。更一般地说,我们的结果证明了稀疏性在神经IR模型中的重要性,并表明密集表示可以被有效地修剪,从而对基本语义特征及其分布提供了新的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信