{"title":"DAPID: A Differential-adaptive PID Optimization Strategy for Neural Network Training","authors":"Yulin Cai, Haoqian Wang","doi":"10.1109/IJCNN55064.2022.9892746","DOIUrl":null,"url":null,"abstract":"Derived from automatic control theory, the PID optimizer for neural network training can effectively inhibit the overshoot phenomenon of conventional optimization algorithms such as SGD-Momentum. However, its differential term may unexpectedly have a relatively large scale during iteration, which may amplify the inherent noise of input samples and deteriorate the training process. In this paper, we adopt a self-adaptive iterating rule for the PID optimizer's differential term, which uses both first-order and second-order moment estimation to calculate the differential's unbiased statistical value approximately. Such strategy prevents the differential term from being divergent and accelerates the iteration without increasing much computational cost. Empirical results on several popular machine learning datasets demonstrate that the proposed optimization strategy achieves favorable acceleration of convergence as well as competitive accuracy compared with other stochastic optimization approaches.","PeriodicalId":106974,"journal":{"name":"2022 International Joint Conference on Neural Networks (IJCNN)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Joint Conference on Neural Networks (IJCNN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN55064.2022.9892746","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Derived from automatic control theory, the PID optimizer for neural network training can effectively inhibit the overshoot phenomenon of conventional optimization algorithms such as SGD-Momentum. However, its differential term may unexpectedly have a relatively large scale during iteration, which may amplify the inherent noise of input samples and deteriorate the training process. In this paper, we adopt a self-adaptive iterating rule for the PID optimizer's differential term, which uses both first-order and second-order moment estimation to calculate the differential's unbiased statistical value approximately. Such strategy prevents the differential term from being divergent and accelerates the iteration without increasing much computational cost. Empirical results on several popular machine learning datasets demonstrate that the proposed optimization strategy achieves favorable acceleration of convergence as well as competitive accuracy compared with other stochastic optimization approaches.