Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning

arXiv - CS - Neural and Evolutionary Computing Pub Date : 2024-08-13 DOI:arxiv-2408.07194

Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman

{"title":"Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning","authors":"Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman","doi":"arxiv-2408.07194","DOIUrl":null,"url":null,"abstract":"Deep learning is mainly based on utilizing gradient-based optimization for\ntraining Deep Neural Network (DNN) models. Although robust and widely used,\ngradient-based optimization algorithms are prone to getting stuck in local\nminima. In this modern deep learning era, the state-of-the-art DNN models have\nmillions and billions of parameters, including weights and biases, making them\nhuge-scale optimization problems in terms of search space. Tuning a huge number\nof parameters is a challenging task that causes vanishing/exploding gradients\nand overfitting; likewise, utilized loss functions do not exactly represent our\ntargeted performance metrics. A practical solution to exploring large and\ncomplex solution space is meta-heuristic algorithms. Since DNNs exceed\nthousands and millions of parameters, even robust meta-heuristic algorithms,\nsuch as Differential Evolution, struggle to efficiently explore and converge in\nsuch huge-dimensional search spaces, leading to very slow convergence and high\nmemory demand. To tackle the mentioned curse of dimensionality, the concept of\nblocking was recently proposed as a technique that reduces the search space\ndimensions by grouping them into blocks. In this study, we aim to introduce\nHistogram-based Blocking Differential Evolution (HBDE), a novel approach that\nhybridizes gradient-based and gradient-free algorithms to optimize parameters.\nExperimental results demonstrated that the HBDE could reduce the parameters in\nthe ResNet-18 model from 11M to 3K during the training/optimizing phase by\nmetaheuristics, namely, the proposed HBDE, which outperforms baseline\ngradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and\nCIFAR-100 datasets showcasing its effectiveness with reduced computational\ndemands for the very first time.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep learning is mainly based on utilizing gradient-based optimization for training Deep Neural Network (DNN) models. Although robust and widely used, gradient-based optimization algorithms are prone to getting stuck in local minima. In this modern deep learning era, the state-of-the-art DNN models have millions and billions of parameters, including weights and biases, making them huge-scale optimization problems in terms of search space. Tuning a huge number of parameters is a challenging task that causes vanishing/exploding gradients and overfitting; likewise, utilized loss functions do not exactly represent our targeted performance metrics. A practical solution to exploring large and complex solution space is meta-heuristic algorithms. Since DNNs exceed thousands and millions of parameters, even robust meta-heuristic algorithms, such as Differential Evolution, struggle to efficiently explore and converge in such huge-dimensional search spaces, leading to very slow convergence and high memory demand. To tackle the mentioned curse of dimensionality, the concept of blocking was recently proposed as a technique that reduces the search space dimensions by grouping them into blocks. In this study, we aim to introduce Histogram-based Blocking Differential Evolution (HBDE), a novel approach that hybridizes gradient-based and gradient-free algorithms to optimize parameters. Experimental results demonstrated that the HBDE could reduce the parameters in the ResNet-18 model from 11M to 3K during the training/optimizing phase by metaheuristics, namely, the proposed HBDE, which outperforms baseline gradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and CIFAR-100 datasets showcasing its effectiveness with reduced computational demands for the very first time.

查看原文本刊更多论文

深度学习中的大规模维度缩减和与元启发式算法的混合

深度学习主要基于梯度优化来训练深度神经网络（DNN）模型。基于梯度的优化算法虽然稳健且应用广泛，但容易陷入局部极值。在现代深度学习时代，最先进的 DNN 模型有数百万乃至数十亿个参数，包括权重和偏置，这使得它们在搜索空间上成为超大规模的优化问题。调整大量参数是一项具有挑战性的任务，会导致梯度消失/爆炸和过拟合；同样，利用的损失函数也不能完全代表我们的目标性能指标。元启发式算法是探索庞大而复杂的求解空间的实用解决方案。由于 DNN 的参数超过数千甚至数百万，即使是鲁棒的元启发式算法，如微分进化算法，也很难有效地探索和收敛这种超大维度的搜索空间，从而导致收敛速度非常缓慢，内存需求也很高。为了解决上述 "维度诅咒 "问题，最近有人提出了 "分块"（blocking）的概念，即通过将搜索空间分组成块来减少搜索空间维度的技术。在本研究中，我们旨在引入基于组图的分块差分进化（Histogram-based Blocking Differential Evolution，HBDE），这是一种混合基于梯度和无梯度算法来优化参数的新方法。实验结果表明，在训练/优化阶段，HBDE 可以通过元启发式方法将 ResNet-18 模型的参数从 11M 减少到 3K，即所提出的 HBDE 优于在 CIFAR-10 和 CIFAR-100 数据集上评估的基于梯度和无梯度 DE 算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Neural and Evolutionary Computing

自引率

0.00%

发文量