基于仿真的多目标同一并行机器调度问题深度强化学习

IF 3.9 3区工程技术 Q2 ENGINEERING, MARINE

International Journal of Naval Architecture and Ocean Engineering Pub Date : 2024-01-01 DOI:10.1016/j.ijnaoe.2024.100629

Sohyun Nam , Young-in Cho , Jong Hun Woo

{"title":"基于仿真的多目标同一并行机器调度问题深度强化学习","authors":"Sohyun Nam , Young-in Cho , Jong Hun Woo","doi":"10.1016/j.ijnaoe.2024.100629","DOIUrl":null,"url":null,"abstract":"<div><div>In the shipbuilding industry, traditional optimization studies based on linear programming and constraint programming have been conducted to solve mid-term or long-term scheduling problems. However, due to the extensive computational time, these methods face limitations in addressing short-term scheduling problems for the unit production systems of shipbuilding processes, where various environmental uncertainties must be considered. This study employs a deep reinforcement learning approach to develop a dynamic scheduling algorithm for the welding process in profile shops, considering the random arrival of materials and variability in processing time. The scheduling problems of the welding process are formulated as multi-objective identical parallel machine scheduling problems, aimed at minimizing both setup time and tardiness. This study proposes a novel Markov decision process model for the multi-objective scheduling problems for the welding process, incorporating setup requirements and due date-related constraints into the state representation, action modelling, and reward design. Additionally, based on the proposed Markov decision process model, this study develops a learning environment in which a discrete-event simulation model of the welding process is integrated for state transition considering the uncertainties in the welding process. In the training phase of the scheduling agent, the Proximal Policy Optimization algorithm is applied to learn the scheduling policy, which is approximated by deep neural networks. The performance of the proposed algorithm is validated in comparison to four priority rules (SSPT, ATCS, MDD, and COVERT) for various test scenarios with different workloads and levels of variability in processing time.</div></div>","PeriodicalId":14160,"journal":{"name":"International Journal of Naval Architecture and Ocean Engineering","volume":"16 ","pages":"Article 100629"},"PeriodicalIF":3.9000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simulation-based deep reinforcement learning for multi-objective identical parallel machine scheduling problem\",\"authors\":\"Sohyun Nam , Young-in Cho , Jong Hun Woo\",\"doi\":\"10.1016/j.ijnaoe.2024.100629\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In the shipbuilding industry, traditional optimization studies based on linear programming and constraint programming have been conducted to solve mid-term or long-term scheduling problems. However, due to the extensive computational time, these methods face limitations in addressing short-term scheduling problems for the unit production systems of shipbuilding processes, where various environmental uncertainties must be considered. This study employs a deep reinforcement learning approach to develop a dynamic scheduling algorithm for the welding process in profile shops, considering the random arrival of materials and variability in processing time. The scheduling problems of the welding process are formulated as multi-objective identical parallel machine scheduling problems, aimed at minimizing both setup time and tardiness. This study proposes a novel Markov decision process model for the multi-objective scheduling problems for the welding process, incorporating setup requirements and due date-related constraints into the state representation, action modelling, and reward design. Additionally, based on the proposed Markov decision process model, this study develops a learning environment in which a discrete-event simulation model of the welding process is integrated for state transition considering the uncertainties in the welding process. In the training phase of the scheduling agent, the Proximal Policy Optimization algorithm is applied to learn the scheduling policy, which is approximated by deep neural networks. The performance of the proposed algorithm is validated in comparison to four priority rules (SSPT, ATCS, MDD, and COVERT) for various test scenarios with different workloads and levels of variability in processing time.</div></div>\",\"PeriodicalId\":14160,\"journal\":{\"name\":\"International Journal of Naval Architecture and Ocean Engineering\",\"volume\":\"16 \",\"pages\":\"Article 100629\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2024-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Naval Architecture and Ocean Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2092678224000487\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, MARINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Naval Architecture and Ocean Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2092678224000487","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, MARINE","Score":null,"Total":0}

引用次数: 0

摘要

在船舶工业中，传统的优化研究是基于线性规划和约束规划来解决中长期调度问题。然而，由于计算时间长，这些方法在解决造船过程单元生产系统的短期调度问题时存在局限性，其中必须考虑各种环境不确定性。考虑到材料的随机到达和加工时间的可变性，采用深度强化学习方法开发了型材车间焊接过程的动态调度算法。将焊接过程的调度问题表述为多目标相同的并联机床调度问题，其目标是建立时间和延迟时间都最小化。针对多目标调度问题，提出了一种新的马尔可夫决策过程模型，该模型将设置要求和到期日相关约束纳入到状态表示、动作建模和奖励设计中。此外，在提出的马尔可夫决策过程模型的基础上，考虑到焊接过程中的不确定性，建立了焊接过程的离散事件仿真模型，用于状态转换的学习环境。在调度智能体的训练阶段，采用近端策略优化算法学习调度策略，并通过深度神经网络逼近调度策略。针对具有不同工作负载和处理时间可变性水平的各种测试场景，将所提出算法的性能与四种优先级规则（SSPT、ATCS、MDD和COVERT）进行比较验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Simulation-based deep reinforcement learning for multi-objective identical parallel machine scheduling problem

In the shipbuilding industry, traditional optimization studies based on linear programming and constraint programming have been conducted to solve mid-term or long-term scheduling problems. However, due to the extensive computational time, these methods face limitations in addressing short-term scheduling problems for the unit production systems of shipbuilding processes, where various environmental uncertainties must be considered. This study employs a deep reinforcement learning approach to develop a dynamic scheduling algorithm for the welding process in profile shops, considering the random arrival of materials and variability in processing time. The scheduling problems of the welding process are formulated as multi-objective identical parallel machine scheduling problems, aimed at minimizing both setup time and tardiness. This study proposes a novel Markov decision process model for the multi-objective scheduling problems for the welding process, incorporating setup requirements and due date-related constraints into the state representation, action modelling, and reward design. Additionally, based on the proposed Markov decision process model, this study develops a learning environment in which a discrete-event simulation model of the welding process is integrated for state transition considering the uncertainties in the welding process. In the training phase of the scheduling agent, the Proximal Policy Optimization algorithm is applied to learn the scheduling policy, which is approximated by deep neural networks. The performance of the proposed algorithm is validated in comparison to four priority rules (SSPT, ATCS, MDD, and COVERT) for various test scenarios with different workloads and levels of variability in processing time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Naval Architecture and Ocean Engineering ENGINEERING, MARINE-

CiteScore

4.90

自引率

4.50%

发文量

审稿时长

12 months

期刊介绍： International Journal of Naval Architecture and Ocean Engineering provides a forum for engineers and scientists from a wide range of disciplines to present and discuss various phenomena in the utilization and preservation of ocean environment. Without being limited by the traditional categorization, it is encouraged to present advanced technology development and scientific research, as long as they are aimed for more and better human engagement with ocean environment. Topics include, but not limited to: marine hydrodynamics; structural mechanics; marine propulsion system; design methodology & practice; production technology; system dynamics & control; marine equipment technology; materials science; underwater acoustics; ocean remote sensing; and information technology related to ship and marine systems; ocean energy systems; marine environmental engineering; maritime safety engineering; polar & arctic engineering; coastal & port engineering; subsea engineering; and specialized watercraft engineering.