Jiawen Zhu , Wenli Du , Chen Fan , Muyi Huang , Chuan Wang , Furong Zhang
{"title":"Reinforcement learning-guided two-stage optimization framework for multi-product batch scheduling","authors":"Jiawen Zhu , Wenli Du , Chen Fan , Muyi Huang , Chuan Wang , Furong Zhang","doi":"10.1016/j.compchemeng.2025.109399","DOIUrl":null,"url":null,"abstract":"<div><div>With the increasing demand for high-end and fine manufacturing, multi-product batch scheduling has become essential in process industries. Its inherent complexity stems from hybrid decision variables and tightly coupled constraints. To address these challenges, this study proposes a two-stage optimization framework that integrates reinforcement learning (RL) and mathematical programming (MP). The RL layer determines batch allocations and production sequences, which are then transmitted as time windows within which the MP layer optimizes continuous variables to ensure feasibility. To handle hybrid action spaces, a mapping mechanism is introduced to unify discrete and continuous decisions. In addition, dynamic short-term targets based on reformulated constraints are designed to address the sparsity of rewards caused by long-horizon objectives. Experiments on polyolefin production scheduling demonstrate that the proposed method outperforms MP and standalone RL in terms of economic profit, production stability, and computational performance.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109399"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425004028","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
With the increasing demand for high-end and fine manufacturing, multi-product batch scheduling has become essential in process industries. Its inherent complexity stems from hybrid decision variables and tightly coupled constraints. To address these challenges, this study proposes a two-stage optimization framework that integrates reinforcement learning (RL) and mathematical programming (MP). The RL layer determines batch allocations and production sequences, which are then transmitted as time windows within which the MP layer optimizes continuous variables to ensure feasibility. To handle hybrid action spaces, a mapping mechanism is introduced to unify discrete and continuous decisions. In addition, dynamic short-term targets based on reformulated constraints are designed to address the sparsity of rewards caused by long-horizon objectives. Experiments on polyolefin production scheduling demonstrate that the proposed method outperforms MP and standalone RL in terms of economic profit, production stability, and computational performance.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.