{"title":"Ape优化器:基于p-Power自适应滤波器的深度学习优化方法。","authors":"Yufei Jin,Han Yang,Xinrui Wang,Yingche Xu,Zhuoran Zhang","doi":"10.1109/tnnls.2025.3610665","DOIUrl":null,"url":null,"abstract":"Deep learning has been widely applied in various domains. Current widely-used optimizers, such as SGD, Adam, and their variants, are designed based on the assumption that the gradient noise generated during model training follows a Gaussian distribution. However, recent empirical studies have found that the gradient noise often does not follow a Gaussian distribution. Instead, the noise exhibits heavy-tailed characteristics consistent with an $\\alpha $ -stable distribution, casting doubt on the performance and robustness of optimizers designed under the assumption of Gaussian noise. Inspired by the least mean p-power (LMP) algorithm from the field of adaptive filtering, we propose a novel optimizer called Ape for deep learning. Ape integrates a p-power adjustment mechanism to compress large gradients and amplify small ones, mitigating the impact of heavy-tailed gradient distributions. It also employs an approach for estimating second moments tailored to $\\alpha $ -stable distributions. Extensive experiments on benchmark datasets demonstrate Ape's effectiveness in improving both accuracy and training speed compared to existing optimizers. The Ape optimizer showcases the potential of cross-disciplinary approaches in advancing deep learning optimization techniques and lays the groundwork for future innovations in this domain.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"19 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Ape Optimizer: A p-Power Adaptive Filter-Based Approach for Deep Learning Optimization.\",\"authors\":\"Yufei Jin,Han Yang,Xinrui Wang,Yingche Xu,Zhuoran Zhang\",\"doi\":\"10.1109/tnnls.2025.3610665\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning has been widely applied in various domains. Current widely-used optimizers, such as SGD, Adam, and their variants, are designed based on the assumption that the gradient noise generated during model training follows a Gaussian distribution. However, recent empirical studies have found that the gradient noise often does not follow a Gaussian distribution. Instead, the noise exhibits heavy-tailed characteristics consistent with an $\\\\alpha $ -stable distribution, casting doubt on the performance and robustness of optimizers designed under the assumption of Gaussian noise. Inspired by the least mean p-power (LMP) algorithm from the field of adaptive filtering, we propose a novel optimizer called Ape for deep learning. Ape integrates a p-power adjustment mechanism to compress large gradients and amplify small ones, mitigating the impact of heavy-tailed gradient distributions. It also employs an approach for estimating second moments tailored to $\\\\alpha $ -stable distributions. Extensive experiments on benchmark datasets demonstrate Ape's effectiveness in improving both accuracy and training speed compared to existing optimizers. The Ape optimizer showcases the potential of cross-disciplinary approaches in advancing deep learning optimization techniques and lays the groundwork for future innovations in this domain.\",\"PeriodicalId\":13303,\"journal\":{\"name\":\"IEEE transactions on neural networks and learning systems\",\"volume\":\"19 1\",\"pages\":\"\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on neural networks and learning systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tnnls.2025.3610665\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3610665","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Ape Optimizer: A p-Power Adaptive Filter-Based Approach for Deep Learning Optimization.
Deep learning has been widely applied in various domains. Current widely-used optimizers, such as SGD, Adam, and their variants, are designed based on the assumption that the gradient noise generated during model training follows a Gaussian distribution. However, recent empirical studies have found that the gradient noise often does not follow a Gaussian distribution. Instead, the noise exhibits heavy-tailed characteristics consistent with an $\alpha $ -stable distribution, casting doubt on the performance and robustness of optimizers designed under the assumption of Gaussian noise. Inspired by the least mean p-power (LMP) algorithm from the field of adaptive filtering, we propose a novel optimizer called Ape for deep learning. Ape integrates a p-power adjustment mechanism to compress large gradients and amplify small ones, mitigating the impact of heavy-tailed gradient distributions. It also employs an approach for estimating second moments tailored to $\alpha $ -stable distributions. Extensive experiments on benchmark datasets demonstrate Ape's effectiveness in improving both accuracy and training speed compared to existing optimizers. The Ape optimizer showcases the potential of cross-disciplinary approaches in advancing deep learning optimization techniques and lays the groundwork for future innovations in this domain.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.