Chengli Tan, Jiangshe Zhang, Junmin Liu, Yihong Gong
{"title":"锐度感知前瞻,加速收敛并提高泛化能力","authors":"Chengli Tan, Jiangshe Zhang, Junmin Liu, Yihong Gong","doi":"10.1109/TPAMI.2024.3444002","DOIUrl":null,"url":null,"abstract":"<p><p>Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Sharpness-aware Lookahead for Accelerating Convergence and Improving Generalization.\",\"authors\":\"Chengli Tan, Jiangshe Zhang, Junmin Liu, Yihong Gong\",\"doi\":\"10.1109/TPAMI.2024.3444002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.</p>\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TPAMI.2024.3444002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2024.3444002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
Lookahead 是一种流行的随机优化器,可以加速深度神经网络的训练过程。然而,Lookahead 找到的解决方案的泛化效果往往不如其基础优化器(如 SGD 和 Adam)。为了解决这个问题,我们提出了锐度感知 Lookahead(SALA),这是一种新颖的优化器,旨在识别泛化效果好的平最小值。SALA 将训练过程分为两个阶段。在第一阶段,通过对优化轨迹进行二次逼近来确定平坦区域的方向,而不会产生任何额外的计算开销。而在第二阶段,则通过锐度感知最小化(SAM)来确定,这对提高训练末期的泛化效果尤为有效。与 Lookahead 相比,SALA 既保留了加速收敛的优点,又比基础优化器具有更优越的泛化性能。对预期超额风险的理论分析,以及对典型神经网络架构和数据集的实证结果,都证明了 SALA 相对于 Lookahead 的优势。值得注意的是,与基础优化器相比,SALA 的计算开销大约增加了 25%,却能达到与 SAM 相同的泛化性能,而 SAM 需要的训练预算是基础优化器的两倍。
Sharpness-aware Lookahead for Accelerating Convergence and Improving Generalization.
Lookahead is a popular stochastic optimizer that can accelerate the training process of deep neural networks. However, the solutions found by Lookahead often generalize worse than those found by its base optimizers, such as SGD and Adam. To address this issue, we propose Sharpness-Aware Lookahead (SALA), a novel optimizer that aims to identify flat minima that generalize well. SALA divides the training process into two stages. In the first stage, the direction towards flat regions is determined by leveraging a quadratic approximation of the optimization trajectory, without incurring any extra computational overhead. In the second stage, however, it is determined by Sharpness-Aware Minimization (SAM), which is particularly effective in improving generalization at the terminal phase of training. In contrast to Lookahead, SALA retains the benefits of accelerated convergence while also enjoying superior generalization performance compared to the base optimizer. Theoretical analysis of the expected excess risk, as well as empirical results on canonical neural network architectures and datasets, demonstrate the advantages of SALA over Lookahead. It is noteworthy that with approximately 25% more computational overhead than the base optimizer, SALA can achieve the same generalization performance as SAM which requires twice the training budget of the base optimizer.