Q-learning-based hyper-heuristic algorithm for open dimension irregular packing problems

IF 4.3 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Operations Research Pub Date : 2025-09-10 DOI:10.1016/j.cor.2025.107279

Yongchun Wang , Qingjin Peng , Zhen Wang , Shuiquan Huang , Zhengkai Xu , Chuanzhen Huang , Baosu Guo

{"title":"Q-learning-based hyper-heuristic algorithm for open dimension irregular packing problems","authors":"Yongchun Wang , Qingjin Peng , Zhen Wang , Shuiquan Huang , Zhengkai Xu , Chuanzhen Huang , Baosu Guo","doi":"10.1016/j.cor.2025.107279","DOIUrl":null,"url":null,"abstract":"<div><div>Heuristic methods provide a computationally efficient framework for addressing two-dimensional irregular packing problems, particularly in resource-constrained industrial settings. As a typical combinatorial optimization problem, irregular packing exhibits exponential growth in computational complexity with increasing workpiece counts, while the solution space dynamically reconfigures due to geometric variability among workpieces. Although heuristic algorithms can generate feasible layouts within acceptable timeframes, their reliance on fixed search rule limits adaptability to diverse scenarios, necessitating more flexible approaches. In this paper, a hyper-heuristic algorithm based on Q-Learning is proposed to solve open dimension packing problems, including one-open and two-open dimension problems. Q-Learning is adopted as the high-level strategy for its ability to optimize low-level heuristic selection through reward-driven experience accumulation. The method incorporates a mixed encoding method for solution representation, four specialized low-level heuristic operators, a linear population decline mechanism, and an elite preservation strategy to balance exploration–exploitation. The Q-Learning controller dynamically selects operators by updating the Q-table based on Bellman’s equation. The proposed algorithm is compared to some advanced algorithms in general datasets. The results show that our method has better performance and applicability.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"185 ","pages":"Article 107279"},"PeriodicalIF":4.3000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0305054825003089","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Heuristic methods provide a computationally efficient framework for addressing two-dimensional irregular packing problems, particularly in resource-constrained industrial settings. As a typical combinatorial optimization problem, irregular packing exhibits exponential growth in computational complexity with increasing workpiece counts, while the solution space dynamically reconfigures due to geometric variability among workpieces. Although heuristic algorithms can generate feasible layouts within acceptable timeframes, their reliance on fixed search rule limits adaptability to diverse scenarios, necessitating more flexible approaches. In this paper, a hyper-heuristic algorithm based on Q-Learning is proposed to solve open dimension packing problems, including one-open and two-open dimension problems. Q-Learning is adopted as the high-level strategy for its ability to optimize low-level heuristic selection through reward-driven experience accumulation. The method incorporates a mixed encoding method for solution representation, four specialized low-level heuristic operators, a linear population decline mechanism, and an elite preservation strategy to balance exploration–exploitation. The Q-Learning controller dynamically selects operators by updating the Q-table based on Bellman’s equation. The proposed algorithm is compared to some advanced algorithms in general datasets. The results show that our method has better performance and applicability.

查看原文本刊更多论文

基于q学习的开维不规则包装问题超启发式算法

启发式方法为解决二维不规则包装问题提供了一个计算效率高的框架，特别是在资源受限的工业环境中。不规则填充是一个典型的组合优化问题，其计算复杂度随着工件数量的增加呈指数增长，且求解空间由于工件之间的几何变化而动态重构。虽然启发式算法可以在可接受的时间范围内生成可行的布局，但它们对固定搜索规则的依赖限制了对不同场景的适应性，需要更灵活的方法。本文提出了一种基于q -学习的超启发式算法来解决开放维包装问题，包括一维和二维问题。Q-Learning能够通过奖励驱动的经验积累来优化低级启发式选择，因此被用作高级策略。该方法采用混合编码方法表示解，四个专门的低级启发式算子，线性种群下降机制和精英保存策略来平衡探索-开发。Q-Learning控制器根据Bellman方程，通过更新q表来动态选择算子。在一般数据集上与一些先进的算法进行了比较。结果表明，该方法具有较好的性能和适用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Operations Research 工程技术-工程：工业

CiteScore

8.60

自引率

8.70%

发文量

292

审稿时长

8.5 months

期刊介绍： Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.