Multi-stage enhanced representation learning for document reranking based on query view

Hai Liu, Xiaozhi Zhu, Yong Tang, Chaobo He, Tianyong Hao
{"title":"Multi-stage enhanced representation learning for document reranking based on query view","authors":"Hai Liu, Xiaozhi Zhu, Yong Tang, Chaobo He, Tianyong Hao","doi":"10.1007/s11280-024-01296-x","DOIUrl":null,"url":null,"abstract":"<p>The large-size language model is able to implicitly extract informative semantic features from queries and candidate documents to achieve impressive reranking performance. However, the large model relies on its own large number of parameters to achieve it and it is not known exactly what semantic information has been learned. In this paper, we propose a multi-stage enhanced representation learning method based on Query-View (MERL) with Intra-query stage and Inter-query stage to guide the model to explicitly learn the semantic relationship between the query and documents. In the Intra-query training stage, a content-based contrastive learning module without considering the special token [CLS] of BERT is utilized to optimize the semantic similarity of query and relevant documents. In the Inter-query training stage, an entity-oriented masked query prediction for establish a semantic relation of query-document pairs and an Inter-query contrastive learning module for extracting similar matching pattern of query-relevant documents are employed. Extensive experiments on MS MARCO passage ranking and TREC DL datasets show that the MERL method obtain significant improvements with a low number of parameters compared to the baseline models.</p>","PeriodicalId":501180,"journal":{"name":"World Wide Web","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Wide Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s11280-024-01296-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The large-size language model is able to implicitly extract informative semantic features from queries and candidate documents to achieve impressive reranking performance. However, the large model relies on its own large number of parameters to achieve it and it is not known exactly what semantic information has been learned. In this paper, we propose a multi-stage enhanced representation learning method based on Query-View (MERL) with Intra-query stage and Inter-query stage to guide the model to explicitly learn the semantic relationship between the query and documents. In the Intra-query training stage, a content-based contrastive learning module without considering the special token [CLS] of BERT is utilized to optimize the semantic similarity of query and relevant documents. In the Inter-query training stage, an entity-oriented masked query prediction for establish a semantic relation of query-document pairs and an Inter-query contrastive learning module for extracting similar matching pattern of query-relevant documents are employed. Extensive experiments on MS MARCO passage ranking and TREC DL datasets show that the MERL method obtain significant improvements with a low number of parameters compared to the baseline models.

Abstract Image

基于查询视图的文档重排多级增强表示学习
大型语言模型能够从查询和候选文档中隐含地提取信息丰富的语义特征,从而实现令人印象深刻的重新排序性能。然而,大模型是依靠自身的大量参数来实现的,而且不知道到底学到了哪些语义信息。在本文中,我们提出了一种基于查询视图(MERL)的多阶段增强表示学习方法,包括查询内阶段(Intra-query stage)和查询间阶段(Inter-query stage),以引导模型明确学习查询与文档之间的语义关系。在查询内训练阶段,利用基于内容的对比学习模块(不考虑 BERT 的特殊标记 [CLS])来优化查询和相关文档的语义相似性。在查询间训练阶段,利用面向实体的屏蔽查询预测建立查询-文档对的语义关系,并利用查询间对比学习模块提取查询-相关文档的相似匹配模式。在 MS MARCO 段落排序和 TREC DL 数据集上进行的大量实验表明,与基线模型相比,MERL 方法在参数数量较少的情况下就能获得显著的改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信