Student Can Also be a Good Teacher: Extracting Knowledge from Vision-and-Language Model for Cross-Modal Retrieval

Jun Rao, Tao Qian, Shuhan Qi, Yulin Wu, Qing Liao, Xuan Wang
{"title":"Student Can Also be a Good Teacher: Extracting Knowledge from Vision-and-Language Model for Cross-Modal Retrieval","authors":"Jun Rao, Tao Qian, Shuhan Qi, Yulin Wu, Qing Liao, Xuan Wang","doi":"10.1145/3459637.3482194","DOIUrl":null,"url":null,"abstract":"Astounding results from transformer models with Vision-and Language Pretraining (VLP) on joint vision-and-language downstream tasks have intrigued the multi-modal community. On the one hand, these models are usually so huge that make us more difficult to fine-tune and serve real-time online applications. On the other hand, the compression of the original transformer block will ignore the difference in information between modalities, which leads to the sharp decline of retrieval accuracy. In this work, we present a very light and effective cross-modal retrieval model compression method. With this method, by adopting a novel random replacement strategy and knowledge distillation, our module can learn the knowledge of the teacher with nearly the half number of parameters reduction. Furthermore, our compression method achieves nearly 130x acceleration with acceptable accuracy. To overcome the sharp decline in retrieval tasks because of compression, we introduce the co-attention interaction module to reflect the different information and interaction information. Experiments show that a multi-modal co-attention block is more suitable for cross-modal retrieval tasks rather than the source transformer encoder block.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Astounding results from transformer models with Vision-and Language Pretraining (VLP) on joint vision-and-language downstream tasks have intrigued the multi-modal community. On the one hand, these models are usually so huge that make us more difficult to fine-tune and serve real-time online applications. On the other hand, the compression of the original transformer block will ignore the difference in information between modalities, which leads to the sharp decline of retrieval accuracy. In this work, we present a very light and effective cross-modal retrieval model compression method. With this method, by adopting a novel random replacement strategy and knowledge distillation, our module can learn the knowledge of the teacher with nearly the half number of parameters reduction. Furthermore, our compression method achieves nearly 130x acceleration with acceptable accuracy. To overcome the sharp decline in retrieval tasks because of compression, we introduce the co-attention interaction module to reflect the different information and interaction information. Experiments show that a multi-modal co-attention block is more suitable for cross-modal retrieval tasks rather than the source transformer encoder block.
学生也可以成为好老师:从视觉和语言模型中提取知识进行跨模态检索
使用视觉和语言预训练(VLP)的变压器模型在联合视觉和语言下游任务上的惊人结果引起了多模态社区的兴趣。一方面,这些模型通常是如此巨大,使我们更难以微调和提供实时在线应用程序。另一方面,原始变压器块的压缩会忽略模态之间的信息差异,导致检索精度急剧下降。在这项工作中,我们提出了一种非常轻和有效的跨模态检索模型压缩方法。该方法采用了一种新颖的随机替换策略和知识精馏,模块可以在参数约简近一半的情况下学习到教师的知识。此外,我们的压缩方法在可接受的精度下实现了近130倍的加速度。为了克服由于压缩导致的检索任务急剧下降的问题,我们引入了共同关注交互模块来反映不同的信息和交互信息。实验表明,多模态共注意块比源转换器编码器块更适合于跨模态检索任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信