D. Wang, Tao Xu, Huatian Zhang, Fanhua Shang, Hongying Liu, Yuanyuan Liu, Shengmei Shen
{"title":"PWPROP: A Progressive Weighted Adaptive Method for Training Deep Neural Networks","authors":"D. Wang, Tao Xu, Huatian Zhang, Fanhua Shang, Hongying Liu, Yuanyuan Liu, Shengmei Shen","doi":"10.1109/ICTAI56018.2022.00081","DOIUrl":null,"url":null,"abstract":"In recent years, adaptive optimization methods for deep learning have attracted considerable attention. AMSGRAD indicates that the adaptive methods may be hard to converge to optimal solutions of some convex problems due to the divergence of its adaptive learning rate as in ADAM. However, we find that AMSGRAD may generalize worse than ADAM for some deep learning tasks. We first show that AMSGRAD may not find a flat minimum. So how can we design an optimization method to find a flat minimum with low training loss? Few works focus on this important problem. We propose a novel progressive weighted adaptive optimization algorithm, called PWPROP, with fewer hyperparameters than its counterparts such as ADAM. By intuitively constructing a “sharp-flat minima” model, we show that how different second-order estimates affect the ability to escape a sharp minimum. Moreover, we also prove that PWPROP can address the non-convergence issue of ADAM and has a sublinear convergence rate for non-convex problems. Extensive experimental results show that PWPROP is effective and suitable for various deep learning architectures such as Transformer, and achieves state-of-the-art results.","PeriodicalId":354314,"journal":{"name":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 34th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI56018.2022.00081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, adaptive optimization methods for deep learning have attracted considerable attention. AMSGRAD indicates that the adaptive methods may be hard to converge to optimal solutions of some convex problems due to the divergence of its adaptive learning rate as in ADAM. However, we find that AMSGRAD may generalize worse than ADAM for some deep learning tasks. We first show that AMSGRAD may not find a flat minimum. So how can we design an optimization method to find a flat minimum with low training loss? Few works focus on this important problem. We propose a novel progressive weighted adaptive optimization algorithm, called PWPROP, with fewer hyperparameters than its counterparts such as ADAM. By intuitively constructing a “sharp-flat minima” model, we show that how different second-order estimates affect the ability to escape a sharp minimum. Moreover, we also prove that PWPROP can address the non-convergence issue of ADAM and has a sublinear convergence rate for non-convex problems. Extensive experimental results show that PWPROP is effective and suitable for various deep learning architectures such as Transformer, and achieves state-of-the-art results.