Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman
{"title":"Massive Dimensions Reduction and Hybridization with Meta-heuristics in Deep Learning","authors":"Rasa Khosrowshahli, Shahryar Rahnamayan, Beatrice Ombuki-Berman","doi":"arxiv-2408.07194","DOIUrl":null,"url":null,"abstract":"Deep learning is mainly based on utilizing gradient-based optimization for\ntraining Deep Neural Network (DNN) models. Although robust and widely used,\ngradient-based optimization algorithms are prone to getting stuck in local\nminima. In this modern deep learning era, the state-of-the-art DNN models have\nmillions and billions of parameters, including weights and biases, making them\nhuge-scale optimization problems in terms of search space. Tuning a huge number\nof parameters is a challenging task that causes vanishing/exploding gradients\nand overfitting; likewise, utilized loss functions do not exactly represent our\ntargeted performance metrics. A practical solution to exploring large and\ncomplex solution space is meta-heuristic algorithms. Since DNNs exceed\nthousands and millions of parameters, even robust meta-heuristic algorithms,\nsuch as Differential Evolution, struggle to efficiently explore and converge in\nsuch huge-dimensional search spaces, leading to very slow convergence and high\nmemory demand. To tackle the mentioned curse of dimensionality, the concept of\nblocking was recently proposed as a technique that reduces the search space\ndimensions by grouping them into blocks. In this study, we aim to introduce\nHistogram-based Blocking Differential Evolution (HBDE), a novel approach that\nhybridizes gradient-based and gradient-free algorithms to optimize parameters.\nExperimental results demonstrated that the HBDE could reduce the parameters in\nthe ResNet-18 model from 11M to 3K during the training/optimizing phase by\nmetaheuristics, namely, the proposed HBDE, which outperforms baseline\ngradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and\nCIFAR-100 datasets showcasing its effectiveness with reduced computational\ndemands for the very first time.","PeriodicalId":501347,"journal":{"name":"arXiv - CS - Neural and Evolutionary Computing","volume":"18 1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Neural and Evolutionary Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07194","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Deep learning is mainly based on utilizing gradient-based optimization for
training Deep Neural Network (DNN) models. Although robust and widely used,
gradient-based optimization algorithms are prone to getting stuck in local
minima. In this modern deep learning era, the state-of-the-art DNN models have
millions and billions of parameters, including weights and biases, making them
huge-scale optimization problems in terms of search space. Tuning a huge number
of parameters is a challenging task that causes vanishing/exploding gradients
and overfitting; likewise, utilized loss functions do not exactly represent our
targeted performance metrics. A practical solution to exploring large and
complex solution space is meta-heuristic algorithms. Since DNNs exceed
thousands and millions of parameters, even robust meta-heuristic algorithms,
such as Differential Evolution, struggle to efficiently explore and converge in
such huge-dimensional search spaces, leading to very slow convergence and high
memory demand. To tackle the mentioned curse of dimensionality, the concept of
blocking was recently proposed as a technique that reduces the search space
dimensions by grouping them into blocks. In this study, we aim to introduce
Histogram-based Blocking Differential Evolution (HBDE), a novel approach that
hybridizes gradient-based and gradient-free algorithms to optimize parameters.
Experimental results demonstrated that the HBDE could reduce the parameters in
the ResNet-18 model from 11M to 3K during the training/optimizing phase by
metaheuristics, namely, the proposed HBDE, which outperforms baseline
gradient-based and parent gradient-free DE algorithms evaluated on CIFAR-10 and
CIFAR-100 datasets showcasing its effectiveness with reduced computational
demands for the very first time.