视觉变形金刚的稀疏双下降:真实的还是虚幻的威胁?

Victor Qu'etu, Marta Milovanović, Enzo Tartaglione
{"title":"视觉变形金刚的稀疏双下降:真实的还是虚幻的威胁?","authors":"Victor Qu'etu, Marta Milovanović, Enzo Tartaglione","doi":"10.48550/arXiv.2307.14253","DOIUrl":null,"url":null,"abstract":"Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned $\\ell_2$ regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.","PeriodicalId":74527,"journal":{"name":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","volume":"413 1","pages":"490-502"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Sparse Double Descent in Vision Transformers: real or phantom threat?\",\"authors\":\"Victor Qu'etu, Marta Milovanović, Enzo Tartaglione\",\"doi\":\"10.48550/arXiv.2307.14253\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned $\\\\ell_2$ regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.\",\"PeriodicalId\":74527,\"journal\":{\"name\":\"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing\",\"volume\":\"413 1\",\"pages\":\"490-502\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2307.14253\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... International Conference on Image Analysis and Processing. International Conference on Image Analysis and Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.14253","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

视觉变压器在近年来的理论和实证研究中引起了广泛的兴趣。它们是最先进的,因为它们基于注意力的方法,由于能够避免归纳偏差,从而提高了对图像中关键特征和模式的识别,从而实现了高度准确的图像分析。与此同时,最近的研究报告了在现代深度学习模型中可能发生的“稀疏双重下降”现象,其中极度过度参数化的模型可以很好地泛化。这就提出了关于模型最优大小的实际问题,并提出了在稀疏性和性能之间寻找最佳权衡的问题:视觉变形器是否也容易出现稀疏双下降?我们能找到一种方法来避免这种现象吗?我们的工作解决了稀疏双下降在vit上的发生。尽管一些研究表明,传统的架构,如Resnet,会受到稀疏双下降现象的影响,但对于vit,我们观察到一个优化调整的$\ell_2$正则化缓解了这种现象。然而,一切都是有代价的:最优lambda将牺牲ViT的潜在压缩。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sparse Double Descent in Vision Transformers: real or phantom threat?
Vision transformers (ViT) have been of broad interest in recent theoretical and empirical works. They are state-of-the-art thanks to their attention-based approach, which boosts the identification of key features and patterns within images thanks to the capability of avoiding inductive bias, resulting in highly accurate image analysis. Meanwhile, neoteric studies have reported a ``sparse double descent'' phenomenon that can occur in modern deep-learning models, where extremely over-parametrized models can generalize well. This raises practical questions about the optimal size of the model and the quest over finding the best trade-off between sparsity and performance is launched: are Vision Transformers also prone to sparse double descent? Can we find a way to avoid such a phenomenon? Our work tackles the occurrence of sparse double descent on ViTs. Despite some works that have shown that traditional architectures, like Resnet, are condemned to the sparse double descent phenomenon, for ViTs we observe that an optimally-tuned $\ell_2$ regularization relieves such a phenomenon. However, everything comes at a cost: optimal lambda will sacrifice the potential compression of the ViT.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信