Eduardo M. Silva , Antônio A. Chaves , Silvio A. de Araujo , Raf Jans
{"title":"Random-Key Optimizer with reinforcement learning for the Capacitated Multi-period Cutting Stock Problem with setup cost","authors":"Eduardo M. Silva , Antônio A. Chaves , Silvio A. de Araujo , Raf Jans","doi":"10.1016/j.cor.2025.107159","DOIUrl":null,"url":null,"abstract":"<div><div>This paper introduces a Random-Key Optimizer (<em>RKO</em>) procedure incorporating reinforcement learning to solve the One-Dimensional Multi-Period Cutting Stock Problem (<em>MPCSP</em>) with setup costs and capacity constraints. The <em>MPCSP</em> involves determining cutting plans for each period to meet customer demands, where inventory variables link consecutive periods. The <em>RKO</em> represents solutions as random-key vectors, which are decoded into feasible solutions for the <em>MPCSP</em> through a decoder process. During the optimization process, the <em>RKO</em> dynamically adapts its parameters using reinforcement learning. This framework integrates Biased Random-Key Genetic Algorithm (<em>BRKGA</em>), Particle Swarm Optimization (<em>PSO</em>), and Simulated Annealing (<em>SA</em>), all utilizing a unified decoder function. A novel penalization mechanism is also introduced within the decoder to handle infeasibilities effectively. The proposed <em>RKO</em> is evaluated on benchmark instances from the literature and compared against state-of-the-art methods, including a hybrid column generation heuristic and a dynamic programming-based heuristic. In addition, a new set of large-scale instances is introduced for further evaluation. Computational experiments reveal that the <em>RKO</em> employed by <em>BRKGA</em> consistently outperforms other solution methods in benchmark instances, delivering superior average solution quality. A sensitivity analysis is also conducted, examining the impact of setup costs and production capacity. Moreover, the study includes a comparative analysis of the <em>RKO</em> framework with and without reinforcement learning.</div></div>","PeriodicalId":10542,"journal":{"name":"Computers & Operations Research","volume":"183 ","pages":"Article 107159"},"PeriodicalIF":4.1000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Operations Research","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S030505482500187X","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper introduces a Random-Key Optimizer (RKO) procedure incorporating reinforcement learning to solve the One-Dimensional Multi-Period Cutting Stock Problem (MPCSP) with setup costs and capacity constraints. The MPCSP involves determining cutting plans for each period to meet customer demands, where inventory variables link consecutive periods. The RKO represents solutions as random-key vectors, which are decoded into feasible solutions for the MPCSP through a decoder process. During the optimization process, the RKO dynamically adapts its parameters using reinforcement learning. This framework integrates Biased Random-Key Genetic Algorithm (BRKGA), Particle Swarm Optimization (PSO), and Simulated Annealing (SA), all utilizing a unified decoder function. A novel penalization mechanism is also introduced within the decoder to handle infeasibilities effectively. The proposed RKO is evaluated on benchmark instances from the literature and compared against state-of-the-art methods, including a hybrid column generation heuristic and a dynamic programming-based heuristic. In addition, a new set of large-scale instances is introduced for further evaluation. Computational experiments reveal that the RKO employed by BRKGA consistently outperforms other solution methods in benchmark instances, delivering superior average solution quality. A sensitivity analysis is also conducted, examining the impact of setup costs and production capacity. Moreover, the study includes a comparative analysis of the RKO framework with and without reinforcement learning.
期刊介绍:
Operations research and computers meet in a large number of scientific fields, many of which are of vital current concern to our troubled society. These include, among others, ecology, transportation, safety, reliability, urban planning, economics, inventory control, investment strategy and logistics (including reverse logistics). Computers & Operations Research provides an international forum for the application of computers and operations research techniques to problems in these and related fields.