基于知识蒸馏的变压器多模态关注到单模态关注

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) Pub Date : 2021-10-15 DOI:10.1109/AVSS52988.2021.9663793

Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, Franccois Bremond

{"title":"基于知识蒸馏的变压器多模态关注到单模态关注","authors":"Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, Franccois Bremond","doi":"10.1109/AVSS52988.2021.9663793","DOIUrl":null,"url":null,"abstract":"Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded","PeriodicalId":246327,"journal":{"name":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"28 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation\",\"authors\":\"Dhruv Agarwal, Tanay Agrawal, Laura M. Ferrari, Franccois Bremond\",\"doi\":\"10.1109/AVSS52988.2021.9663793\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded\",\"PeriodicalId\":246327,\"journal\":{\"name\":\"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"volume\":\"28 7\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AVSS52988.2021.9663793\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS52988.2021.9663793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

由于交叉注意机制，多模态深度学习引起了人们的兴趣，变形器引发了新的方法。在这里，我们提出了一种方法来处理两个关键的现有挑战:高计算资源需求和缺失模式的问题。首次在变压器中引入知识蒸馏的概念，在推理时只使用一种模态。我们报告了一项完整的研究，分析了多个学生-教师配置，应用蒸馏的水平和不同的方法。通过最佳配置，我们将最先进的精度提高了3%，将参数数量减少了2.5倍，推理时间减少了22%。这种性能计算的权衡可以在许多应用程序中利用，我们的目标是打开一个新的研究领域，在有限资源的复杂模型的部署需求

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From Multimodal to Unimodal Attention in Transformers using Knowledge Distillation

Multimodal Deep Learning has garnered much interest, and transformers have triggered novel approaches, thanks to the cross-attention mechanism. Here we propose an approach to deal with two key existing challenges: the high computational resource demanded and the issue of missing modalities. We introduce for the first time the concept of knowledge distillation in transformers to use only one modality at inference time. We report a full study analyzing multiple student-teacher configurations, levels at which distillation is applied, and different methodologies. With the best configuration, we improved the state-of-the-art accuracy by 3%, we reduced the number of parameters by 2.5 times and the inference time by 22%. Such performance-computation tradeoff can be exploited in many applications and we aim at opening a new research area where the deployment of complex models with limited resources is demanded

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 17th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

自引率

0.00%

发文量