{"title":"动态配板问题的整数优化深度强化学习","authors":"Tianyang Li;Ying Meng;Lixin Tang;Yuxuan Zhang","doi":"10.1109/TCST.2025.3552543","DOIUrl":null,"url":null,"abstract":"This article investigates a dynamic slab assignment problem (DSAP) that arises in the slab production process of steel industry. In DSAP, a set of slabs and orders arrive dynamically at each time step of a planning period, and their information cannot be observed in advance. For a planning period, a series of decisions need to be made on allocating the slabs to customer orders, self-designed orders, or holding them in inventory to maximize total rewards. To address DSAP effectively, we formulate a Markov decision process (MDP) model and propose a deep reinforcement learning algorithm combined with an integer programming (DRLIP) model. DRLIP decomposes each decision time step into two stages, i.e., dynamic selection stage and static assignment stage. The dynamic selection stage primarily uses a double-pointer network (DPN) to select the slabs and orders to be involved in matching. In the static assignment stage, an extension of a multiknapsack problem is constructed based on the selected slabs and orders. We formulate an integer programming (IP) model to solve this multiknapsack problem for obtaining an optimal assignment decision, which in turn provides a reward for each time step. To evaluate the effectiveness of DRLIP, we use a global method, three advanced heuristic methods, and a scenario tree method for comparison on practical and randomly generated problem instances. Computational results show that DRLIP yields a mean gap of 21.5% versus the optimum from the global method and outperforms the other comparison methods.","PeriodicalId":13103,"journal":{"name":"IEEE Transactions on Control Systems Technology","volume":"33 5","pages":"1586-1600"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Reinforcement Learning With Integer Optimization for Dynamic Slab Assignment Problem\",\"authors\":\"Tianyang Li;Ying Meng;Lixin Tang;Yuxuan Zhang\",\"doi\":\"10.1109/TCST.2025.3552543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article investigates a dynamic slab assignment problem (DSAP) that arises in the slab production process of steel industry. In DSAP, a set of slabs and orders arrive dynamically at each time step of a planning period, and their information cannot be observed in advance. For a planning period, a series of decisions need to be made on allocating the slabs to customer orders, self-designed orders, or holding them in inventory to maximize total rewards. To address DSAP effectively, we formulate a Markov decision process (MDP) model and propose a deep reinforcement learning algorithm combined with an integer programming (DRLIP) model. DRLIP decomposes each decision time step into two stages, i.e., dynamic selection stage and static assignment stage. The dynamic selection stage primarily uses a double-pointer network (DPN) to select the slabs and orders to be involved in matching. In the static assignment stage, an extension of a multiknapsack problem is constructed based on the selected slabs and orders. We formulate an integer programming (IP) model to solve this multiknapsack problem for obtaining an optimal assignment decision, which in turn provides a reward for each time step. To evaluate the effectiveness of DRLIP, we use a global method, three advanced heuristic methods, and a scenario tree method for comparison on practical and randomly generated problem instances. Computational results show that DRLIP yields a mean gap of 21.5% versus the optimum from the global method and outperforms the other comparison methods.\",\"PeriodicalId\":13103,\"journal\":{\"name\":\"IEEE Transactions on Control Systems Technology\",\"volume\":\"33 5\",\"pages\":\"1586-1600\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Control Systems Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10945997/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Control Systems Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945997/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Deep Reinforcement Learning With Integer Optimization for Dynamic Slab Assignment Problem
This article investigates a dynamic slab assignment problem (DSAP) that arises in the slab production process of steel industry. In DSAP, a set of slabs and orders arrive dynamically at each time step of a planning period, and their information cannot be observed in advance. For a planning period, a series of decisions need to be made on allocating the slabs to customer orders, self-designed orders, or holding them in inventory to maximize total rewards. To address DSAP effectively, we formulate a Markov decision process (MDP) model and propose a deep reinforcement learning algorithm combined with an integer programming (DRLIP) model. DRLIP decomposes each decision time step into two stages, i.e., dynamic selection stage and static assignment stage. The dynamic selection stage primarily uses a double-pointer network (DPN) to select the slabs and orders to be involved in matching. In the static assignment stage, an extension of a multiknapsack problem is constructed based on the selected slabs and orders. We formulate an integer programming (IP) model to solve this multiknapsack problem for obtaining an optimal assignment decision, which in turn provides a reward for each time step. To evaluate the effectiveness of DRLIP, we use a global method, three advanced heuristic methods, and a scenario tree method for comparison on practical and randomly generated problem instances. Computational results show that DRLIP yields a mean gap of 21.5% versus the optimum from the global method and outperforms the other comparison methods.
期刊介绍:
The IEEE Transactions on Control Systems Technology publishes high quality technical papers on technological advances in control engineering. The word technology is from the Greek technologia. The modern meaning is a scientific method to achieve a practical purpose. Control Systems Technology includes all aspects of control engineering needed to implement practical control systems, from analysis and design, through simulation and hardware. A primary purpose of the IEEE Transactions on Control Systems Technology is to have an archival publication which will bridge the gap between theory and practice. Papers are published in the IEEE Transactions on Control System Technology which disclose significant new knowledge, exploratory developments, or practical applications in all aspects of technology needed to implement control systems, from analysis and design through simulation, and hardware.