稀疏摄动锐度感知最小化优化器的系统研究

IF 18.6
Peng Mi;Li Shen;Tianhe Ren;Yiyi Zhou;Tianshuo Xu;Xiaoshuai Sun;Tongliang Liu;Rongrong Ji;Dacheng Tao
{"title":"稀疏摄动锐度感知最小化优化器的系统研究","authors":"Peng Mi;Li Shen;Tianhe Ren;Yiyi Zhou;Tianshuo Xu;Xiaoshuai Sun;Tongliang Liu;Rongrong Ji;Dacheng Tao","doi":"10.1109/TPAMI.2025.3581310","DOIUrl":null,"url":null,"abstract":"Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and <inline-formula><tex-math>$N$</tex-math></inline-formula>:<inline-formula><tex-math>$M$</tex-math></inline-formula> structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., <inline-formula><tex-math>$O(\\log T/\\sqrt{T})$</tex-math></inline-formula> . Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"8538-8549"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer\",\"authors\":\"Peng Mi;Li Shen;Tianhe Ren;Yiyi Zhou;Tianshuo Xu;Xiaoshuai Sun;Tongliang Liu;Rongrong Ji;Dacheng Tao\",\"doi\":\"10.1109/TPAMI.2025.3581310\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and <inline-formula><tex-math>$N$</tex-math></inline-formula>:<inline-formula><tex-math>$M$</tex-math></inline-formula> structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., <inline-formula><tex-math>$O(\\\\log T/\\\\sqrt{T})$</tex-math></inline-formula> . Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 10\",\"pages\":\"8538-8549\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11054316/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11054316/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

深度神经网络由于其复杂的非凸损失场景,往往泛化能力较差。锐度感知最小化(SAM)是一种流行的解决方案,当向权重添加扰动时,它通过最小化训练损失的最大变化来平滑损失。然而,SAM对所有参数的任意扰动是次优的,并且会导致过多的计算,是随机梯度下降(SGD)等常用优化器的开销的两倍。本文提出了一种利用二值掩码实现稀疏摄动的高效训练方案——稀疏SAM (SSAM)。为了获得稀疏掩码,我们分别提出了基于Fisher信息和基于动态稀疏训练的两种解决方案。我们研究了不同掩模的影响,包括非结构化、结构化和$N$: $M$结构化模式,以及实现稀疏扰动的显式和隐式形式。我们从理论上证明了SSAM可以以与SAM相同的速度收敛,即$O(\log T/\sqrt{T})$。稀疏SAM具有加速训练和有效平滑损失的潜力。在CIFAR和ImageNet-1K上的大量实验结果证实,我们的方法在效率上优于SAM,并且在扰动仅为50的情况下,性能得以保持甚至提高% sparsity.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and $N$:$M$ structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., $O(\log T/\sqrt{T})$ . Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信