Q-learning-based hyper-heuristic algorithm for open dimension irregular packing problems

IF 4.3 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Yongchun Wang , Qingjin Peng , Zhen Wang , Shuiquan Huang , Zhengkai Xu , Chuanzhen Huang , Baosu Guo
{"title":"Q-learning-based hyper-heuristic algorithm for open dimension irregular packing problems","authors":"Yongchun Wang ,&nbsp;Qingjin Peng ,&nbsp;Zhen Wang ,&nbsp;Shuiquan Huang ,&nbsp;Zhengkai Xu ,&nbsp;Chuanzhen Huang ,&nbsp;Baosu Guo","doi":"10.1016/j.cor.2025.107279","DOIUrl":null,"url":null,"abstract":"<div><div>Heuristic methods provide a computationally efficient framework for addressing two-dimensional irregular packing problems, particularly in resource-constrained industrial settings. As a typical combinatorial optimization problem, irregular packing exhibits exponential growth in computational complexity with increasing workpiece counts, while the solution space dynamically reconfigures due to geometric variability among workpieces. Although heuristic algorithms can generate feasible layouts within acceptable timeframes, their reliance on fixed search rule limits adaptability to diverse scenarios, necessitating more flexible approaches. In this paper, a hyper-heuristic algorithm based on Q-Learning is proposed to solve open dimension packing problems, including one-open and two-open dimension problems. Q-Learning is adopted as the high-level strategy for its ability to optimize low-level heuristic selection through reward-driven experience accumulation. The method incorporates a mixed encoding method for solution representation, four specialized low-level heuristic operators, a linear population decline mechanism, and an elite preservation strategy to balance exploration–exploitation. The Q-Learning controller dynamically selects operators by updating the Q-table based on Bellman’s equation. The proposed algorithm is compared to some advanced algorithms in general datasets. The results show that our method has better performance and applicability.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"185 ","pages":"Article 107279"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825003089","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Heuristic methods provide a computationally efficient framework for addressing two-dimensional irregular packing problems, particularly in resource-constrained industrial settings. As a typical combinatorial optimization problem, irregular packing exhibits exponential growth in computational complexity with increasing workpiece counts, while the solution space dynamically reconfigures due to geometric variability among workpieces. Although heuristic algorithms can generate feasible layouts within acceptable timeframes, their reliance on fixed search rule limits adaptability to diverse scenarios, necessitating more flexible approaches. In this paper, a hyper-heuristic algorithm based on Q-Learning is proposed to solve open dimension packing problems, including one-open and two-open dimension problems. Q-Learning is adopted as the high-level strategy for its ability to optimize low-level heuristic selection through reward-driven experience accumulation. The method incorporates a mixed encoding method for solution representation, four specialized low-level heuristic operators, a linear population decline mechanism, and an elite preservation strategy to balance exploration–exploitation. The Q-Learning controller dynamically selects operators by updating the Q-table based on Bellman’s equation. The proposed algorithm is compared to some advanced algorithms in general datasets. The results show that our method has better performance and applicability.
基于q学习的开维不规则包装问题超启发式算法
启发式方法为解决二维不规则包装问题提供了一个计算效率高的框架,特别是在资源受限的工业环境中。不规则填充是一个典型的组合优化问题,其计算复杂度随着工件数量的增加呈指数增长,且求解空间由于工件之间的几何变化而动态重构。虽然启发式算法可以在可接受的时间范围内生成可行的布局,但它们对固定搜索规则的依赖限制了对不同场景的适应性,需要更灵活的方法。本文提出了一种基于q -学习的超启发式算法来解决开放维包装问题,包括一维和二维问题。Q-Learning能够通过奖励驱动的经验积累来优化低级启发式选择,因此被用作高级策略。该方法采用混合编码方法表示解,四个专门的低级启发式算子,线性种群下降机制和精英保存策略来平衡探索-开发。Q-Learning控制器根据Bellman方程,通过更新q表来动态选择算子。在一般数据集上与一些先进的算法进行了比较。结果表明,该方法具有较好的性能和适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Operations Research
Computers & Operations Research 工程技术-工程:工业
CiteScore
8.60
自引率
8.70%
发文量
292
审稿时长
8.5 months
期刊介绍: Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信