利用概率扩展控制进行多模态学习稀疏检索

Thong Nguyen, Mariya Hendriksen, Andrew Yates, M. D. Rijke
{"title":"利用概率扩展控制进行多模态学习稀疏检索","authors":"Thong Nguyen, Mariya Hendriksen, Andrew Yates, M. D. Rijke","doi":"10.48550/arXiv.2402.17535","DOIUrl":null,"url":null,"abstract":"Learned sparse retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors that can be indexed and retrieved efficiently with an inverted index. We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval. While LSR has seen success in text retrieval, its application in multimodal retrieval remains underexplored. Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets. Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors. We address issues of high dimension co-activation and semantic deviation through a new training algorithm, using Bernoulli random variables to control query expansion. Experiments with two dense models (BLIP, ALBEF) and two datasets (MSCOCO, Flickr30k) show that our proposed algorithm effectively reduces co-activation and semantic deviation. Our best-performing sparsified model outperforms state-of-the-art text-image LSR models with a shorter training time and lower GPU memory requirements. Our approach offers an effective solution for training LSR retrieval models in multimodal settings. Our code and model checkpoints are available at github.com/thongnt99/lsr-multimodal","PeriodicalId":126309,"journal":{"name":"European Conference on Information Retrieval","volume":"41 4","pages":"448-464"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control\",\"authors\":\"Thong Nguyen, Mariya Hendriksen, Andrew Yates, M. D. Rijke\",\"doi\":\"10.48550/arXiv.2402.17535\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learned sparse retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors that can be indexed and retrieved efficiently with an inverted index. We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval. While LSR has seen success in text retrieval, its application in multimodal retrieval remains underexplored. Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets. Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors. We address issues of high dimension co-activation and semantic deviation through a new training algorithm, using Bernoulli random variables to control query expansion. Experiments with two dense models (BLIP, ALBEF) and two datasets (MSCOCO, Flickr30k) show that our proposed algorithm effectively reduces co-activation and semantic deviation. Our best-performing sparsified model outperforms state-of-the-art text-image LSR models with a shorter training time and lower GPU memory requirements. Our approach offers an effective solution for training LSR retrieval models in multimodal settings. Our code and model checkpoints are available at github.com/thongnt99/lsr-multimodal\",\"PeriodicalId\":126309,\"journal\":{\"name\":\"European Conference on Information Retrieval\",\"volume\":\"41 4\",\"pages\":\"448-464\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Conference on Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2402.17535\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2402.17535","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

学习稀疏检索(LSR)是一系列神经方法,可将查询和文档编码为稀疏词性向量,并通过倒排索引进行高效检索。我们探讨了 LSR 在多模态领域的应用,重点是文本-图像检索。虽然 LSR 在文本检索中取得了成功,但其在多模态检索中的应用仍未得到充分探索。LexLIP 和 STAIR 等当前方法需要在海量数据集上进行复杂的多步骤训练。我们提出的方法能有效地将稠密向量从冻结的稠密模型转换为稀疏词向量。我们通过一种新的训练算法,使用伯努利随机变量来控制查询扩展,从而解决了高维共激活和语义偏差的问题。对两个密集模型(BLIP、ALBEF)和两个数据集(MSCOCO、Flickr30k)的实验表明,我们提出的算法能有效减少共激活和语义偏差。我们性能最佳的稀疏化模型优于最先进的文本-图像 LSR 模型,而且训练时间更短,对 GPU 内存的要求更低。我们的方法为在多模态环境中训练 LSR 检索模型提供了有效的解决方案。我们的代码和模型检查点见 github.com/thongnt99/lsr-multimodal。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
Learned sparse retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors that can be indexed and retrieved efficiently with an inverted index. We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval. While LSR has seen success in text retrieval, its application in multimodal retrieval remains underexplored. Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets. Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors. We address issues of high dimension co-activation and semantic deviation through a new training algorithm, using Bernoulli random variables to control query expansion. Experiments with two dense models (BLIP, ALBEF) and two datasets (MSCOCO, Flickr30k) show that our proposed algorithm effectively reduces co-activation and semantic deviation. Our best-performing sparsified model outperforms state-of-the-art text-image LSR models with a shorter training time and lower GPU memory requirements. Our approach offers an effective solution for training LSR retrieval models in multimodal settings. Our code and model checkpoints are available at github.com/thongnt99/lsr-multimodal
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信