基于深度强化学习的限时电价下炼钢连铸调度问题求解

IF 7 2区工程技术 Q1 ENGINEERING, INDUSTRIAL

International Journal of Production Research Pub Date : 2023-10-11 DOI:10.1080/00207543.2023.2267693

Ruilin Pan, Qiong Wang, Jianhua Cao, Chunliu Zhou

{"title":"基于深度强化学习的限时电价下炼钢连铸调度问题求解","authors":"Ruilin Pan, Qiong Wang, Jianhua Cao, Chunliu Zhou","doi":"10.1080/00207543.2023.2267693","DOIUrl":null,"url":null,"abstract":"AbstractThis paper proposes a novel intelligent scheduling method based on deep reinforcement learning (DRL) to solve the multi-objective steelmaking-continuous casting (SCC) scheduling problem, under time-of-use (TOU) tariffs for the first time. The intelligent scheduling system architecture is designed, and a mathematical model is established to minimise the total sojourn time and electricity cost. To effectively reduce production costs by avoiding peak periods of electricity consumption, the ‘start time’ of the system is generated based on the Markov Decision Process (MDP), and heuristic scheduling rules related to power cost are used as the action space, with corresponding reward functions designed according to the characteristics of these two objectives. To satisfy the continuous casting which is a particular SCC constraint, a backward strategy is developed. Additionally, a branching duelling double deep Q-network (BD3QN) is adapted to guide action selection and avoid blind search in the iteration process, and then applied to real-time scheduling. Numerical experiments demonstrate that the proposed method outperforms comparison algorithms in terms of solution quality and CPU times by a large margin.KEYWORDS: Steelmaking-continuous castingschedulingdeep reinforcement learningtime-of-use tariffsmulti-objective optimisation Data availability statementThe authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research work is supported by the National Natural Science Foundation of China [grant number 71772002], University Natural Science Research Project of Anhui Province (Key Project) [grant number KJ2021A0384], University Synergy Innovation Program of Anhui Province [grant number GXXT-2022-098].Notes on contributorsRuilin PanRuilin Pan received the Ph.D. degree in Enterprise Management from Dalian University of Technology, Dalian, China, in 2010. He is currently a Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. His research interests include industrial data science, machine learning, and reinforcement learning. He has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Journal of Intelligent Manufacturing, European Journal of Operational Research, and Computers & Industrial Engineering.Qiong WangQiong Wang received the M.E. degree in Management Science and Engineering from Anhui University of Technology, Anhui, China, in 2022. Her research interests include operations planning and scheduling problems in production, mathematical modelling, optimisation and heuristic methods.Jianhua CaoJianhua Cao received the Ph.D. degree in Business Administration from Zhejiang University of Technology, Hangzhou, China, in 2022. She is currently an Associate Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. Her research interests include operations research and optimisation, production scheduling and machine learning. She has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Transportation Letters, and Computers & Industrial Engineering.Chunliu ZhouChunliu Zhou received the Ph.D. degree in Enterprise Management from the Dalian University of Technology, Dalian, China, in 2020. She is currently a Lecturer at the Department of Industrial Engineering, Anhui University of Technology, Anhui, China. Her research interests include production planning and control, product data management, and data-driven process management. She has published papers in journals such as Advanced Engineering Informatics and Industrial Engineering and Management.","PeriodicalId":14307,"journal":{"name":"International Journal of Production Research","volume":"25 1","pages":"0"},"PeriodicalIF":7.0000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep reinforcement learning for solving steelmaking-continuous casting scheduling problems under time-of-use tariffs\",\"authors\":\"Ruilin Pan, Qiong Wang, Jianhua Cao, Chunliu Zhou\",\"doi\":\"10.1080/00207543.2023.2267693\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractThis paper proposes a novel intelligent scheduling method based on deep reinforcement learning (DRL) to solve the multi-objective steelmaking-continuous casting (SCC) scheduling problem, under time-of-use (TOU) tariffs for the first time. The intelligent scheduling system architecture is designed, and a mathematical model is established to minimise the total sojourn time and electricity cost. To effectively reduce production costs by avoiding peak periods of electricity consumption, the ‘start time’ of the system is generated based on the Markov Decision Process (MDP), and heuristic scheduling rules related to power cost are used as the action space, with corresponding reward functions designed according to the characteristics of these two objectives. To satisfy the continuous casting which is a particular SCC constraint, a backward strategy is developed. Additionally, a branching duelling double deep Q-network (BD3QN) is adapted to guide action selection and avoid blind search in the iteration process, and then applied to real-time scheduling. Numerical experiments demonstrate that the proposed method outperforms comparison algorithms in terms of solution quality and CPU times by a large margin.KEYWORDS: Steelmaking-continuous castingschedulingdeep reinforcement learningtime-of-use tariffsmulti-objective optimisation Data availability statementThe authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research work is supported by the National Natural Science Foundation of China [grant number 71772002], University Natural Science Research Project of Anhui Province (Key Project) [grant number KJ2021A0384], University Synergy Innovation Program of Anhui Province [grant number GXXT-2022-098].Notes on contributorsRuilin PanRuilin Pan received the Ph.D. degree in Enterprise Management from Dalian University of Technology, Dalian, China, in 2010. He is currently a Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. His research interests include industrial data science, machine learning, and reinforcement learning. He has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Journal of Intelligent Manufacturing, European Journal of Operational Research, and Computers & Industrial Engineering.Qiong WangQiong Wang received the M.E. degree in Management Science and Engineering from Anhui University of Technology, Anhui, China, in 2022. Her research interests include operations planning and scheduling problems in production, mathematical modelling, optimisation and heuristic methods.Jianhua CaoJianhua Cao received the Ph.D. degree in Business Administration from Zhejiang University of Technology, Hangzhou, China, in 2022. She is currently an Associate Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. Her research interests include operations research and optimisation, production scheduling and machine learning. She has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Transportation Letters, and Computers & Industrial Engineering.Chunliu ZhouChunliu Zhou received the Ph.D. degree in Enterprise Management from the Dalian University of Technology, Dalian, China, in 2020. She is currently a Lecturer at the Department of Industrial Engineering, Anhui University of Technology, Anhui, China. Her research interests include production planning and control, product data management, and data-driven process management. She has published papers in journals such as Advanced Engineering Informatics and Industrial Engineering and Management.\",\"PeriodicalId\":14307,\"journal\":{\"name\":\"International Journal of Production Research\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":7.0000,\"publicationDate\":\"2023-10-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Production Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/00207543.2023.2267693\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Production Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/00207543.2023.2267693","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

摘要

摘要首次提出了一种基于深度强化学习(DRL)的智能调度方法，用于解决在分时电价(TOU)下的炼钢-连铸(SCC)多目标调度问题。设计了智能调度系统体系结构，建立了以总停留时间和电力成本最小为目标的数学模型。为了避免用电高峰，有效降低生产成本，基于马尔可夫决策过程(MDP)生成系统的“启动时间”，并以与电力成本相关的启发式调度规则作为行动空间，根据这两个目标的特点设计相应的奖励函数。为了满足连续铸造这一特殊的SCC约束，提出了一种逆向策略。此外，采用分支战双深度q网络(BD3QN)来指导动作选择，避免迭代过程中的盲目搜索，并将其应用于实时调度。数值实验表明，该方法在求解质量和CPU时间上都大大优于比较算法。关键词:炼钢-连铸调度深度强化学习使用时间关税多目标优化数据可用性声明作者确认，支持本研究结果的数据可在文章[和/或]其补充材料中获得。披露声明作者未报告潜在的利益冲突。本研究得到国家自然科学基金项目[批准号71772002]、安徽省高校自然科学研究项目(重点项目)[批准号KJ2021A0384]、安徽省高校协同创新计划[批准号GXXT-2022-098]的支持。潘瑞林，2010年毕业于大连理工大学企业管理专业，获博士学位。他目前是中国安徽工业大学管理科学与工程学院运营管理教授。他的研究兴趣包括工业数据科学、机器学习和强化学习。曾在《运筹学年鉴》、《群与进化计算》、《智能制造杂志》、《欧洲运筹学杂志》、《计算机与工业工程》等期刊上发表论文。王琼于2022年获得中国安徽工业大学管理科学与工程硕士学位。她的研究兴趣包括生产中的操作计划和调度问题、数学建模、优化和启发式方法。曹建华，2022年毕业于中国杭州浙江工业大学，获工商管理博士学位。她目前是中国安徽工业大学管理科学与工程学院运营管理副教授。她的研究兴趣包括运筹学和优化、生产调度和机器学习。她曾在《运筹学年鉴》、《群与进化计算》、《交通快报》、《计算机与工业工程》等期刊上发表论文。周春流，2020年毕业于大连理工大学企业管理专业，获博士学位。她目前是中国安徽工业大学工业工程系讲师。主要研究方向为生产计划与控制、产品数据管理、数据驱动过程管理。曾在《高级工程信息学》、《工业工程与管理》等期刊上发表论文。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep reinforcement learning for solving steelmaking-continuous casting scheduling problems under time-of-use tariffs

AbstractThis paper proposes a novel intelligent scheduling method based on deep reinforcement learning (DRL) to solve the multi-objective steelmaking-continuous casting (SCC) scheduling problem, under time-of-use (TOU) tariffs for the first time. The intelligent scheduling system architecture is designed, and a mathematical model is established to minimise the total sojourn time and electricity cost. To effectively reduce production costs by avoiding peak periods of electricity consumption, the ‘start time’ of the system is generated based on the Markov Decision Process (MDP), and heuristic scheduling rules related to power cost are used as the action space, with corresponding reward functions designed according to the characteristics of these two objectives. To satisfy the continuous casting which is a particular SCC constraint, a backward strategy is developed. Additionally, a branching duelling double deep Q-network (BD3QN) is adapted to guide action selection and avoid blind search in the iteration process, and then applied to real-time scheduling. Numerical experiments demonstrate that the proposed method outperforms comparison algorithms in terms of solution quality and CPU times by a large margin.KEYWORDS: Steelmaking-continuous castingschedulingdeep reinforcement learningtime-of-use tariffsmulti-objective optimisation Data availability statementThe authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research work is supported by the National Natural Science Foundation of China [grant number 71772002], University Natural Science Research Project of Anhui Province (Key Project) [grant number KJ2021A0384], University Synergy Innovation Program of Anhui Province [grant number GXXT-2022-098].Notes on contributorsRuilin PanRuilin Pan received the Ph.D. degree in Enterprise Management from Dalian University of Technology, Dalian, China, in 2010. He is currently a Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. His research interests include industrial data science, machine learning, and reinforcement learning. He has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Journal of Intelligent Manufacturing, European Journal of Operational Research, and Computers & Industrial Engineering.Qiong WangQiong Wang received the M.E. degree in Management Science and Engineering from Anhui University of Technology, Anhui, China, in 2022. Her research interests include operations planning and scheduling problems in production, mathematical modelling, optimisation and heuristic methods.Jianhua CaoJianhua Cao received the Ph.D. degree in Business Administration from Zhejiang University of Technology, Hangzhou, China, in 2022. She is currently an Associate Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. Her research interests include operations research and optimisation, production scheduling and machine learning. She has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Transportation Letters, and Computers & Industrial Engineering.Chunliu ZhouChunliu Zhou received the Ph.D. degree in Enterprise Management from the Dalian University of Technology, Dalian, China, in 2020. She is currently a Lecturer at the Department of Industrial Engineering, Anhui University of Technology, Anhui, China. Her research interests include production planning and control, product data management, and data-driven process management. She has published papers in journals such as Advanced Engineering Informatics and Industrial Engineering and Management.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Production Research 管理科学-工程：工业

CiteScore

19.20

自引率

14.10%

发文量

318

审稿时长

6.3 months

期刊介绍： The International Journal of Production Research (IJPR), published since 1961, is a well-established, highly successful and leading journal reporting manufacturing, production and operations management research. IJPR is published 24 times a year and includes papers on innovation management, design of products, manufacturing processes, production and logistics systems. Production economics, the essential behaviour of production resources and systems as well as the complex decision problems that arise in design, management and control of production and logistics systems are considered. IJPR is a journal for researchers and professors in mechanical engineering, industrial and systems engineering, operations research and management science, and business. It is also an informative reference for industrial managers looking to improve the efficiency and effectiveness of their production systems.