稀疏摄动锐度感知最小化优化器的系统研究

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-27 DOI:10.1109/TPAMI.2025.3581310

Peng Mi;Li Shen;Tianhe Ren;Yiyi Zhou;Tianshuo Xu;Xiaoshuai Sun;Tongliang Liu;Rongrong Ji;Dacheng Tao

{"title":"稀疏摄动锐度感知最小化优化器的系统研究","authors":"Peng Mi;Li Shen;Tianhe Ren;Yiyi Zhou;Tianshuo Xu;Xiaoshuai Sun;Tongliang Liu;Rongrong Ji;Dacheng Tao","doi":"10.1109/TPAMI.2025.3581310","DOIUrl":null,"url":null,"abstract":"Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and <inline-formula><tex-math>$N$</tex-math></inline-formula>:<inline-formula><tex-math>$M$</tex-math></inline-formula> structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., <inline-formula><tex-math>$O(\\log T/\\sqrt{T})$</tex-math></inline-formula> . Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"8538-8549"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer\",\"authors\":\"Peng Mi;Li Shen;Tianhe Ren;Yiyi Zhou;Tianshuo Xu;Xiaoshuai Sun;Tongliang Liu;Rongrong Ji;Dacheng Tao\",\"doi\":\"10.1109/TPAMI.2025.3581310\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and <inline-formula><tex-math>$N$</tex-math></inline-formula>:<inline-formula><tex-math>$M$</tex-math></inline-formula> structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e., <inline-formula><tex-math>$O(\\\\log T/\\\\sqrt{T})$</tex-math></inline-formula> . Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"47 10\",\"pages\":\"8538-8549\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11054316/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11054316/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络由于其复杂的非凸损失场景，往往泛化能力较差。锐度感知最小化（SAM）是一种流行的解决方案，当向权重添加扰动时，它通过最小化训练损失的最大变化来平滑损失。然而，SAM对所有参数的任意扰动是次优的，并且会导致过多的计算，是随机梯度下降（SGD）等常用优化器的开销的两倍。本文提出了一种利用二值掩码实现稀疏摄动的高效训练方案——稀疏SAM （SSAM）。为了获得稀疏掩码，我们分别提出了基于Fisher信息和基于动态稀疏训练的两种解决方案。我们研究了不同掩模的影响，包括非结构化、结构化和$N$: $M$结构化模式，以及实现稀疏扰动的显式和隐式形式。我们从理论上证明了SSAM可以以与SAM相同的速度收敛，即$O(\log T/\sqrt{T})$。稀疏SAM具有加速训练和有效平滑损失的潜力。在CIFAR和ImageNet-1K上的大量实验结果证实，我们的方法在效率上优于SAM，并且在扰动仅为50的情况下，性能得以保持甚至提高% sparsity.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computation, double the overhead of common optimizers like Stochastic Gradient Descent (SGD). In this paper, we propose Sparse SAM (SSAM), an efficient and effective training scheme that achieves sparse perturbation by a binary mask. To obtain the sparse mask, we provide two solutions based on Fisher information and dynamic sparse training, respectively. We investigate the impact of different masks, including unstructured, structured, and

$N$

$M$

structured patterns, as well as explicit and implicit forms of implementing sparse perturbation. We theoretically prove that SSAM can converge at the same rate as SAM, i.e.,

$O(\log T/\sqrt{T})$

. Sparse SAM has the potential to accelerate training and smooth the loss landscape effectively. Extensive experimental results on CIFAR and ImageNet-1K confirm that our method is superior to SAM in terms of efficiency, and the performance is preserved or even improved with a perturbation of merely 50% sparsity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量