{"title":"Stochastic gradient accelerated by negative momentum for training deep neural networks","authors":"Xiaotian Li, Zhuang Yang, Yang Wang","doi":"10.1007/s10489-025-06900-9","DOIUrl":null,"url":null,"abstract":"<div><p>The fast and robust stochastic optimization algorithms for training deep neural networks (DNNs) are still a topic of heated discussion. As a simple but effective way, the momentum technique, which utilizes historical gradient information, shows significant promise in training DNNs both theoretically and empirically. Nonetheless, the accumulation of error gradients in stochastic settings leads to the failure of momentum techniques, e.g., Nesterov’s accelerated gradient (NAG), in accelerating stochastic optimization algorithms. To address this problem, a novel type of stochastic optimization algorithm based on negative momentum (NM) is developed and analyzed. In this work, we applied NM to vanilla stochastic gradient descent (SGD), leading to SGD-NM. Although a convex combination of previous and current historical information is adopted in SGD-NM, fewer hyperparameters are introduced than those of the existing NM techniques. Meanwhile, we establish a theoretical guarantee for the resulting SGD-NM and show that SGD-NM enjoys the same low computational cost as vanilla SGD. To further show the superiority of NM in stochastic optimization algorithms, we propose a variant of stochastically controlled stochastic gradient (SCSG) based on the negative momentum technique, termed SCSG-NM, which achieves faster convergence compared to SCSG. Finally, we conduct experiments on various DNN architectures and benchmark datasets. The comparison results with state-of-the-art stochastic optimization algorithms show the great potential of NM in accelerating stochastic optimization, including more robust to large learning rates and better generalization.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 15","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-025-06900-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The fast and robust stochastic optimization algorithms for training deep neural networks (DNNs) are still a topic of heated discussion. As a simple but effective way, the momentum technique, which utilizes historical gradient information, shows significant promise in training DNNs both theoretically and empirically. Nonetheless, the accumulation of error gradients in stochastic settings leads to the failure of momentum techniques, e.g., Nesterov’s accelerated gradient (NAG), in accelerating stochastic optimization algorithms. To address this problem, a novel type of stochastic optimization algorithm based on negative momentum (NM) is developed and analyzed. In this work, we applied NM to vanilla stochastic gradient descent (SGD), leading to SGD-NM. Although a convex combination of previous and current historical information is adopted in SGD-NM, fewer hyperparameters are introduced than those of the existing NM techniques. Meanwhile, we establish a theoretical guarantee for the resulting SGD-NM and show that SGD-NM enjoys the same low computational cost as vanilla SGD. To further show the superiority of NM in stochastic optimization algorithms, we propose a variant of stochastically controlled stochastic gradient (SCSG) based on the negative momentum technique, termed SCSG-NM, which achieves faster convergence compared to SCSG. Finally, we conduct experiments on various DNN architectures and benchmark datasets. The comparison results with state-of-the-art stochastic optimization algorithms show the great potential of NM in accelerating stochastic optimization, including more robust to large learning rates and better generalization.
期刊介绍:
With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance.
The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.