动态配板问题的整数优化深度强化学习

IF 3.9 2区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS
Tianyang Li;Ying Meng;Lixin Tang;Yuxuan Zhang
{"title":"动态配板问题的整数优化深度强化学习","authors":"Tianyang Li;Ying Meng;Lixin Tang;Yuxuan Zhang","doi":"10.1109/TCST.2025.3552543","DOIUrl":null,"url":null,"abstract":"This article investigates a dynamic slab assignment problem (DSAP) that arises in the slab production process of steel industry. In DSAP, a set of slabs and orders arrive dynamically at each time step of a planning period, and their information cannot be observed in advance. For a planning period, a series of decisions need to be made on allocating the slabs to customer orders, self-designed orders, or holding them in inventory to maximize total rewards. To address DSAP effectively, we formulate a Markov decision process (MDP) model and propose a deep reinforcement learning algorithm combined with an integer programming (DRLIP) model. DRLIP decomposes each decision time step into two stages, i.e., dynamic selection stage and static assignment stage. The dynamic selection stage primarily uses a double-pointer network (DPN) to select the slabs and orders to be involved in matching. In the static assignment stage, an extension of a multiknapsack problem is constructed based on the selected slabs and orders. We formulate an integer programming (IP) model to solve this multiknapsack problem for obtaining an optimal assignment decision, which in turn provides a reward for each time step. To evaluate the effectiveness of DRLIP, we use a global method, three advanced heuristic methods, and a scenario tree method for comparison on practical and randomly generated problem instances. Computational results show that DRLIP yields a mean gap of 21.5% versus the optimum from the global method and outperforms the other comparison methods.","PeriodicalId":13103,"journal":{"name":"IEEE Transactions on Control Systems Technology","volume":"33 5","pages":"1586-1600"},"PeriodicalIF":3.9000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Reinforcement Learning With Integer Optimization for Dynamic Slab Assignment Problem\",\"authors\":\"Tianyang Li;Ying Meng;Lixin Tang;Yuxuan Zhang\",\"doi\":\"10.1109/TCST.2025.3552543\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article investigates a dynamic slab assignment problem (DSAP) that arises in the slab production process of steel industry. In DSAP, a set of slabs and orders arrive dynamically at each time step of a planning period, and their information cannot be observed in advance. For a planning period, a series of decisions need to be made on allocating the slabs to customer orders, self-designed orders, or holding them in inventory to maximize total rewards. To address DSAP effectively, we formulate a Markov decision process (MDP) model and propose a deep reinforcement learning algorithm combined with an integer programming (DRLIP) model. DRLIP decomposes each decision time step into two stages, i.e., dynamic selection stage and static assignment stage. The dynamic selection stage primarily uses a double-pointer network (DPN) to select the slabs and orders to be involved in matching. In the static assignment stage, an extension of a multiknapsack problem is constructed based on the selected slabs and orders. We formulate an integer programming (IP) model to solve this multiknapsack problem for obtaining an optimal assignment decision, which in turn provides a reward for each time step. To evaluate the effectiveness of DRLIP, we use a global method, three advanced heuristic methods, and a scenario tree method for comparison on practical and randomly generated problem instances. Computational results show that DRLIP yields a mean gap of 21.5% versus the optimum from the global method and outperforms the other comparison methods.\",\"PeriodicalId\":13103,\"journal\":{\"name\":\"IEEE Transactions on Control Systems Technology\",\"volume\":\"33 5\",\"pages\":\"1586-1600\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-03-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Control Systems Technology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10945997/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Control Systems Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945997/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

本文研究了钢铁工业板坯生产过程中出现的动态板坯分配问题。在DSAP中,一组板和订单在计划周期的每个时间步都是动态到达的,它们的信息无法提前观察到。在计划期间,需要做出一系列决定,将平板分配给客户订单、自行设计订单或将其保存在库存中,以最大化总奖励。为了有效地解决DSAP问题,我们建立了马尔可夫决策过程(MDP)模型,并提出了一种结合整数规划(DRLIP)模型的深度强化学习算法。DRLIP将每个决策时间步分解为两个阶段,即动态选择阶段和静态分配阶段。动态选择阶段主要使用双指针网络(DPN)来选择要参与匹配的板和订单。在静态分配阶段,根据所选择的平板和订单构造了多背包问题的扩展。我们建立了一个整数规划(IP)模型来解决这个多背包问题,以获得最优分配决策,从而为每个时间步提供奖励。为了评估DRLIP的有效性,我们使用了一种全局方法、三种先进的启发式方法和一种场景树方法来比较实际和随机生成的问题实例。计算结果表明,与全局方法的最优值相比,DRLIP的平均误差为21.5%,优于其他比较方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Reinforcement Learning With Integer Optimization for Dynamic Slab Assignment Problem
This article investigates a dynamic slab assignment problem (DSAP) that arises in the slab production process of steel industry. In DSAP, a set of slabs and orders arrive dynamically at each time step of a planning period, and their information cannot be observed in advance. For a planning period, a series of decisions need to be made on allocating the slabs to customer orders, self-designed orders, or holding them in inventory to maximize total rewards. To address DSAP effectively, we formulate a Markov decision process (MDP) model and propose a deep reinforcement learning algorithm combined with an integer programming (DRLIP) model. DRLIP decomposes each decision time step into two stages, i.e., dynamic selection stage and static assignment stage. The dynamic selection stage primarily uses a double-pointer network (DPN) to select the slabs and orders to be involved in matching. In the static assignment stage, an extension of a multiknapsack problem is constructed based on the selected slabs and orders. We formulate an integer programming (IP) model to solve this multiknapsack problem for obtaining an optimal assignment decision, which in turn provides a reward for each time step. To evaluate the effectiveness of DRLIP, we use a global method, three advanced heuristic methods, and a scenario tree method for comparison on practical and randomly generated problem instances. Computational results show that DRLIP yields a mean gap of 21.5% versus the optimum from the global method and outperforms the other comparison methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Control Systems Technology
IEEE Transactions on Control Systems Technology 工程技术-工程:电子与电气
CiteScore
10.70
自引率
2.10%
发文量
218
审稿时长
6.7 months
期刊介绍: The IEEE Transactions on Control Systems Technology publishes high quality technical papers on technological advances in control engineering. The word technology is from the Greek technologia. The modern meaning is a scientific method to achieve a practical purpose. Control Systems Technology includes all aspects of control engineering needed to implement practical control systems, from analysis and design, through simulation and hardware. A primary purpose of the IEEE Transactions on Control Systems Technology is to have an archival publication which will bridge the gap between theory and practice. Papers are published in the IEEE Transactions on Control System Technology which disclose significant new knowledge, exploratory developments, or practical applications in all aspects of technology needed to implement control systems, from analysis and design through simulation, and hardware.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信