基于大语言模型增强粒子群优化的深度学习模型超参数整定

IEEE Open Journal of the Computer Society Pub Date : 2025-04-25 DOI:10.1109/OJCS.2025.3564493

Saad Hameed;Basheer Qolomany;Samir Brahim Belhaouari;Mohamed Abdallah;Junaid Qadir;Ala Al-Fuqaha

{"title":"基于大语言模型增强粒子群优化的深度学习模型超参数整定","authors":"Saad Hameed;Basheer Qolomany;Samir Brahim Belhaouari;Mohamed Abdallah;Junaid Qadir;Ala Al-Fuqaha","doi":"10.1109/OJCS.2025.3564493","DOIUrl":null,"url":null,"abstract":"Determining the ideal architecture for deep learning models, such as the number of layers and neurons, is a difficult and resource-intensive process that frequently relies on human tuning or computationally costly optimization approaches. While Particle Swarm Optimization (PSO) and Large Language Models (LLMs) have been individually applied in optimization and deep learning, their combined use for enhancing convergence in numerical optimization tasks remains underexplored. Our work addresses this gap by integrating LLMs into PSO to reduce model evaluations and improve convergence for deep learning hyperparameter tuning. The proposed LLM-enhanced PSO method addresses the difficulties of efficiency and convergence by using LLMs (particularly ChatGPT-3.5 and Llama3) to improve PSO performance, allowing for faster achievement of target objectives. Our method speeds up search space exploration by substituting underperforming particle placements with best suggestions offered by LLMs. Comprehensive experiments across three scenarios—(1) optimizing the Rastrigin function, (2) using Long Short-Term Memory (LSTM) networks for time series regression, and (3) using Convolutional Neural Networks (CNNs) for material classification—show that the method significantly improves convergence rates and lowers computational costs. Depending on the application, computational complexity is lowered by 20% to 60% compared to traditional PSO methods. Llama3 achieved a 20% to 40% reduction in model calls for regression tasks, whereas ChatGPT-3.5 reduced model calls by 60% for both regression and classification tasks, all while preserving accuracy and error rates. This groundbreaking methodology offers a very efficient and effective solution for optimizing deep learning models, leading to substantial computational performance improvements across a wide range of applications.","PeriodicalId":13205,"journal":{"name":"IEEE Open Journal of the Computer Society","volume":"6 ","pages":"574-585"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10976715","citationCount":"0","resultStr":"{\"title\":\"Large Language Model Enhanced Particle Swarm Optimization for Hyperparameter Tuning for Deep Learning Models\",\"authors\":\"Saad Hameed;Basheer Qolomany;Samir Brahim Belhaouari;Mohamed Abdallah;Junaid Qadir;Ala Al-Fuqaha\",\"doi\":\"10.1109/OJCS.2025.3564493\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Determining the ideal architecture for deep learning models, such as the number of layers and neurons, is a difficult and resource-intensive process that frequently relies on human tuning or computationally costly optimization approaches. While Particle Swarm Optimization (PSO) and Large Language Models (LLMs) have been individually applied in optimization and deep learning, their combined use for enhancing convergence in numerical optimization tasks remains underexplored. Our work addresses this gap by integrating LLMs into PSO to reduce model evaluations and improve convergence for deep learning hyperparameter tuning. The proposed LLM-enhanced PSO method addresses the difficulties of efficiency and convergence by using LLMs (particularly ChatGPT-3.5 and Llama3) to improve PSO performance, allowing for faster achievement of target objectives. Our method speeds up search space exploration by substituting underperforming particle placements with best suggestions offered by LLMs. Comprehensive experiments across three scenarios—(1) optimizing the Rastrigin function, (2) using Long Short-Term Memory (LSTM) networks for time series regression, and (3) using Convolutional Neural Networks (CNNs) for material classification—show that the method significantly improves convergence rates and lowers computational costs. Depending on the application, computational complexity is lowered by 20% to 60% compared to traditional PSO methods. Llama3 achieved a 20% to 40% reduction in model calls for regression tasks, whereas ChatGPT-3.5 reduced model calls by 60% for both regression and classification tasks, all while preserving accuracy and error rates. This groundbreaking methodology offers a very efficient and effective solution for optimizing deep learning models, leading to substantial computational performance improvements across a wide range of applications.\",\"PeriodicalId\":13205,\"journal\":{\"name\":\"IEEE Open Journal of the Computer Society\",\"volume\":\"6 \",\"pages\":\"574-585\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10976715\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Open Journal of the Computer Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10976715/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Computer Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10976715/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

确定深度学习模型的理想架构，例如层数和神经元的数量，是一个困难且资源密集的过程，经常依赖于人工调整或计算成本高昂的优化方法。虽然粒子群优化（PSO）和大型语言模型（LLMs）已经分别应用于优化和深度学习，但它们在数值优化任务中增强收敛性的组合使用仍未得到充分的探索。我们的工作通过将llm集成到PSO中来减少模型评估并提高深度学习超参数调优的收敛性，从而解决了这一差距。提出的llm增强PSO方法通过使用llm（特别是ChatGPT-3.5和Llama3）来提高PSO性能，从而更快地实现目标目标，从而解决了效率和收敛的困难。我们的方法通过用法学硕士提供的最佳建议替换表现不佳的粒子位置来加快搜索空间的探索。三种场景下的综合实验-(1)优化Rastrigin函数，(2)使用长短期记忆（LSTM）网络进行时间序列回归，(3)使用卷积神经网络（cnn）进行材料分类-表明该方法显着提高了收敛速度并降低了计算成本。根据应用的不同，与传统的粒子群算法相比，计算复杂度降低了20%到60%。Llama3在回归任务中减少了20%到40%的模型调用，而ChatGPT-3.5在回归和分类任务中减少了60%的模型调用，同时保持了准确性和错误率。这种开创性的方法为优化深度学习模型提供了非常高效和有效的解决方案，从而在广泛的应用中大幅提高计算性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large Language Model Enhanced Particle Swarm Optimization for Hyperparameter Tuning for Deep Learning Models

Determining the ideal architecture for deep learning models, such as the number of layers and neurons, is a difficult and resource-intensive process that frequently relies on human tuning or computationally costly optimization approaches. While Particle Swarm Optimization (PSO) and Large Language Models (LLMs) have been individually applied in optimization and deep learning, their combined use for enhancing convergence in numerical optimization tasks remains underexplored. Our work addresses this gap by integrating LLMs into PSO to reduce model evaluations and improve convergence for deep learning hyperparameter tuning. The proposed LLM-enhanced PSO method addresses the difficulties of efficiency and convergence by using LLMs (particularly ChatGPT-3.5 and Llama3) to improve PSO performance, allowing for faster achievement of target objectives. Our method speeds up search space exploration by substituting underperforming particle placements with best suggestions offered by LLMs. Comprehensive experiments across three scenarios—(1) optimizing the Rastrigin function, (2) using Long Short-Term Memory (LSTM) networks for time series regression, and (3) using Convolutional Neural Networks (CNNs) for material classification—show that the method significantly improves convergence rates and lowers computational costs. Depending on the application, computational complexity is lowered by 20% to 60% compared to traditional PSO methods. Llama3 achieved a 20% to 40% reduction in model calls for regression tasks, whereas ChatGPT-3.5 reduced model calls by 60% for both regression and classification tasks, all while preserving accuracy and error rates. This groundbreaking methodology offers a very efficient and effective solution for optimizing deep learning models, leading to substantial computational performance improvements across a wide range of applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Open Journal of the Computer Society

CiteScore

12.60

自引率

0.00%

发文量