Unifying and revisiting Sharpness-Aware Minimization with noise-injected micro-batch scheduler for efficiency improvement

IF 6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Networks Pub Date : 2025-02-03 DOI:10.1016/j.neunet.2025.107205

Zheng Wei, Xingjun Zhang, Zhendong Tan

{"title":"Unifying and revisiting Sharpness-Aware Minimization with noise-injected micro-batch scheduler for efficiency improvement","authors":"Zheng Wei, Xingjun Zhang, Zhendong Tan","doi":"10.1016/j.neunet.2025.107205","DOIUrl":null,"url":null,"abstract":"<div><div>Sharpness-aware minimization (SAM) has been proposed to improve generalization by encouraging the model to converge to a flatter region. However, SAM’s two sequential gradient computations lead to 2<span><math><mo>×</mo></math></span> computation overhead compared to the base optimizer (e.g., SGD). Recent works improve SAM’s efficiency either by switching between SAM and base optimizer or by reducing data samples. In this paper, we first propose the micro-batch scheduler to unify the above two ideas and summarize that the commonality of them is adopting a smaller micro-batch to approximate the perturbation. However, its role is not fully explored. Thus, we revisit the effect of micro-batch approximated perturbation on accuracy and efficiency and empirically observe that a too-small micro-batch causes accuracy degradation as it leads to a sharper loss landscape. To alleviate it, we inject random noise into the micro-batch approximated gradient in SAM’s first ascent step, which implicitly leverages random perturbation before SAM’s second descent step. The visualization results confirm that it encourages the model to converge to a flatter region. Extensive experiments with various models (e.g., ResNet-18/50, WideResNet-28-10, PyramidNet-110, and ViT-B/16, etc.) evaluated on CIFAR-10 and ImageNet-1K show that the proposed method achieves competitive accuracy with higher efficiency when compared to several efficient SAM variants (e.g., ESAM, LooKSAM-5, AE-SAM, K-SAM, etc.).</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"185 ","pages":"Article 107205"},"PeriodicalIF":6.0000,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S089360802500084X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Sharpness-aware minimization (SAM) has been proposed to improve generalization by encouraging the model to converge to a flatter region. However, SAM’s two sequential gradient computations lead to 2

\times

computation overhead compared to the base optimizer (e.g., SGD). Recent works improve SAM’s efficiency either by switching between SAM and base optimizer or by reducing data samples. In this paper, we first propose the micro-batch scheduler to unify the above two ideas and summarize that the commonality of them is adopting a smaller micro-batch to approximate the perturbation. However, its role is not fully explored. Thus, we revisit the effect of micro-batch approximated perturbation on accuracy and efficiency and empirically observe that a too-small micro-batch causes accuracy degradation as it leads to a sharper loss landscape. To alleviate it, we inject random noise into the micro-batch approximated gradient in SAM’s first ascent step, which implicitly leverages random perturbation before SAM’s second descent step. The visualization results confirm that it encourages the model to converge to a flatter region. Extensive experiments with various models (e.g., ResNet-18/50, WideResNet-28-10, PyramidNet-110, and ViT-B/16, etc.) evaluated on CIFAR-10 and ImageNet-1K show that the proposed method achieves competitive accuracy with higher efficiency when compared to several efficient SAM variants (e.g., ESAM, LooKSAM-5, AE-SAM, K-SAM, etc.).

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Neural Networks 工程技术-计算机：人工智能

CiteScore

13.90

自引率

7.70%

发文量

425

审稿时长

67 days

期刊介绍： Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.