Mohammad Askarizadeh;Alireza Morsali;Sadegh Tofigh;Kim Khoa Nguyen
{"title":"Convex-Concave Programming: An Effective Alternative for Optimizing Shallow Neural Networks","authors":"Mohammad Askarizadeh;Alireza Morsali;Sadegh Tofigh;Kim Khoa Nguyen","doi":"10.1109/TETCI.2024.3502463","DOIUrl":null,"url":null,"abstract":"In this study, we address the challenges of non-convex optimization in neural networks (NNs) by formulating the training of multilayer perceptron (MLP) NNs as a difference of convex functions (DC) problem. Utilizing the basic convex–concave algorithm to solve our DC problems, we introduce two alternative optimization techniques, <italic>DC-GD</i> and <italic>DC-OPT</i>, for determining MLP parameters. By leveraging the non-uniqueness property of the convex components in DC functions, we generate strongly convex components for the DC NN cost function. This strong convexity enables our proposed algorithms, <italic>DC-GD</i> and <italic>DC-OPT</i>, to achieve an <italic>iteration complexity</i> of <inline-formula><tex-math>$O\\left(\\log \\left(\\frac{1}{\\varepsilon }\\right)\\right)$</tex-math></inline-formula>, surpassing that of other solvers, such as stochastic gradient descent (<italic>SGD</i>), which has an <italic>iteration complexity</i> of <inline-formula><tex-math>$O\\left(\\frac{1}{\\varepsilon }\\right)$</tex-math></inline-formula>. This improvement raises the convergence rate from sublinear (<italic>SGD</i>) to linear (ours) while maintaining comparable <italic>total computational costs</i>. Furthermore, conventional NN optimizers like <italic>SGD</i>, <italic>RMSprop</i>, and <italic>Adam</i> are highly sensitive to the learning rate, adding computational overhead for practitioners in selecting an appropriate learning rate. In contrast, our <italic>DC-OPT</i> algorithm is hyperparameter-free (i.e., it requires no learning rate), and our <italic>DC-GD</i> algorithm is less sensitive to the learning rate, offering comparable accuracy to other solvers. Additionally, we extend our approach to a convolutional NN architecture, demonstrating its applicability to modern NNs. We evaluate the performance of our proposed algorithms by comparing them to conventional optimizers such as <italic>SGD</i>, <italic>RMSprop</i>, and <italic>Adam</i> across various test cases. The results suggest that our approach is a viable alternative for optimizing shallow MLP NNs.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 4","pages":"2894-2907"},"PeriodicalIF":5.3000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10779190/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, we address the challenges of non-convex optimization in neural networks (NNs) by formulating the training of multilayer perceptron (MLP) NNs as a difference of convex functions (DC) problem. Utilizing the basic convex–concave algorithm to solve our DC problems, we introduce two alternative optimization techniques, DC-GD and DC-OPT, for determining MLP parameters. By leveraging the non-uniqueness property of the convex components in DC functions, we generate strongly convex components for the DC NN cost function. This strong convexity enables our proposed algorithms, DC-GD and DC-OPT, to achieve an iteration complexity of $O\left(\log \left(\frac{1}{\varepsilon }\right)\right)$, surpassing that of other solvers, such as stochastic gradient descent (SGD), which has an iteration complexity of $O\left(\frac{1}{\varepsilon }\right)$. This improvement raises the convergence rate from sublinear (SGD) to linear (ours) while maintaining comparable total computational costs. Furthermore, conventional NN optimizers like SGD, RMSprop, and Adam are highly sensitive to the learning rate, adding computational overhead for practitioners in selecting an appropriate learning rate. In contrast, our DC-OPT algorithm is hyperparameter-free (i.e., it requires no learning rate), and our DC-GD algorithm is less sensitive to the learning rate, offering comparable accuracy to other solvers. Additionally, we extend our approach to a convolutional NN architecture, demonstrating its applicability to modern NNs. We evaluate the performance of our proposed algorithms by comparing them to conventional optimizers such as SGD, RMSprop, and Adam across various test cases. The results suggest that our approach is a viable alternative for optimizing shallow MLP NNs.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.