{"title":"Deep reinforcement learning for solving steelmaking-continuous casting scheduling problems under time-of-use tariffs","authors":"Ruilin Pan, Qiong Wang, Jianhua Cao, Chunliu Zhou","doi":"10.1080/00207543.2023.2267693","DOIUrl":null,"url":null,"abstract":"AbstractThis paper proposes a novel intelligent scheduling method based on deep reinforcement learning (DRL) to solve the multi-objective steelmaking-continuous casting (SCC) scheduling problem, under time-of-use (TOU) tariffs for the first time. The intelligent scheduling system architecture is designed, and a mathematical model is established to minimise the total sojourn time and electricity cost. To effectively reduce production costs by avoiding peak periods of electricity consumption, the ‘start time’ of the system is generated based on the Markov Decision Process (MDP), and heuristic scheduling rules related to power cost are used as the action space, with corresponding reward functions designed according to the characteristics of these two objectives. To satisfy the continuous casting which is a particular SCC constraint, a backward strategy is developed. Additionally, a branching duelling double deep Q-network (BD3QN) is adapted to guide action selection and avoid blind search in the iteration process, and then applied to real-time scheduling. Numerical experiments demonstrate that the proposed method outperforms comparison algorithms in terms of solution quality and CPU times by a large margin.KEYWORDS: Steelmaking-continuous castingschedulingdeep reinforcement learningtime-of-use tariffsmulti-objective optimisation Data availability statementThe authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research work is supported by the National Natural Science Foundation of China [grant number 71772002], University Natural Science Research Project of Anhui Province (Key Project) [grant number KJ2021A0384], University Synergy Innovation Program of Anhui Province [grant number GXXT-2022-098].Notes on contributorsRuilin PanRuilin Pan received the Ph.D. degree in Enterprise Management from Dalian University of Technology, Dalian, China, in 2010. He is currently a Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. His research interests include industrial data science, machine learning, and reinforcement learning. He has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Journal of Intelligent Manufacturing, European Journal of Operational Research, and Computers & Industrial Engineering.Qiong WangQiong Wang received the M.E. degree in Management Science and Engineering from Anhui University of Technology, Anhui, China, in 2022. Her research interests include operations planning and scheduling problems in production, mathematical modelling, optimisation and heuristic methods.Jianhua CaoJianhua Cao received the Ph.D. degree in Business Administration from Zhejiang University of Technology, Hangzhou, China, in 2022. She is currently an Associate Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. Her research interests include operations research and optimisation, production scheduling and machine learning. She has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Transportation Letters, and Computers & Industrial Engineering.Chunliu ZhouChunliu Zhou received the Ph.D. degree in Enterprise Management from the Dalian University of Technology, Dalian, China, in 2020. She is currently a Lecturer at the Department of Industrial Engineering, Anhui University of Technology, Anhui, China. Her research interests include production planning and control, product data management, and data-driven process management. She has published papers in journals such as Advanced Engineering Informatics and Industrial Engineering and Management.","PeriodicalId":14307,"journal":{"name":"International Journal of Production Research","volume":"25 1","pages":"0"},"PeriodicalIF":7.0000,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Production Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/00207543.2023.2267693","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0
Abstract
AbstractThis paper proposes a novel intelligent scheduling method based on deep reinforcement learning (DRL) to solve the multi-objective steelmaking-continuous casting (SCC) scheduling problem, under time-of-use (TOU) tariffs for the first time. The intelligent scheduling system architecture is designed, and a mathematical model is established to minimise the total sojourn time and electricity cost. To effectively reduce production costs by avoiding peak periods of electricity consumption, the ‘start time’ of the system is generated based on the Markov Decision Process (MDP), and heuristic scheduling rules related to power cost are used as the action space, with corresponding reward functions designed according to the characteristics of these two objectives. To satisfy the continuous casting which is a particular SCC constraint, a backward strategy is developed. Additionally, a branching duelling double deep Q-network (BD3QN) is adapted to guide action selection and avoid blind search in the iteration process, and then applied to real-time scheduling. Numerical experiments demonstrate that the proposed method outperforms comparison algorithms in terms of solution quality and CPU times by a large margin.KEYWORDS: Steelmaking-continuous castingschedulingdeep reinforcement learningtime-of-use tariffsmulti-objective optimisation Data availability statementThe authors confirm that the data supporting the findings of this study are available within the article [and/or] its supplementary materials.Disclosure statementNo potential conflict of interest was reported by the author(s).Additional informationFundingThis research work is supported by the National Natural Science Foundation of China [grant number 71772002], University Natural Science Research Project of Anhui Province (Key Project) [grant number KJ2021A0384], University Synergy Innovation Program of Anhui Province [grant number GXXT-2022-098].Notes on contributorsRuilin PanRuilin Pan received the Ph.D. degree in Enterprise Management from Dalian University of Technology, Dalian, China, in 2010. He is currently a Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. His research interests include industrial data science, machine learning, and reinforcement learning. He has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Journal of Intelligent Manufacturing, European Journal of Operational Research, and Computers & Industrial Engineering.Qiong WangQiong Wang received the M.E. degree in Management Science and Engineering from Anhui University of Technology, Anhui, China, in 2022. Her research interests include operations planning and scheduling problems in production, mathematical modelling, optimisation and heuristic methods.Jianhua CaoJianhua Cao received the Ph.D. degree in Business Administration from Zhejiang University of Technology, Hangzhou, China, in 2022. She is currently an Associate Professor of Operations Management with the School of Management Science and Engineering, Anhui University of Technology, Anhui, China. Her research interests include operations research and optimisation, production scheduling and machine learning. She has published papers in journals such as Annals of Operations Research, Swarm and Evolutionary Computation, Transportation Letters, and Computers & Industrial Engineering.Chunliu ZhouChunliu Zhou received the Ph.D. degree in Enterprise Management from the Dalian University of Technology, Dalian, China, in 2020. She is currently a Lecturer at the Department of Industrial Engineering, Anhui University of Technology, Anhui, China. Her research interests include production planning and control, product data management, and data-driven process management. She has published papers in journals such as Advanced Engineering Informatics and Industrial Engineering and Management.
期刊介绍:
The International Journal of Production Research (IJPR), published since 1961, is a well-established, highly successful and leading journal reporting manufacturing, production and operations management research.
IJPR is published 24 times a year and includes papers on innovation management, design of products, manufacturing processes, production and logistics systems. Production economics, the essential behaviour of production resources and systems as well as the complex decision problems that arise in design, management and control of production and logistics systems are considered.
IJPR is a journal for researchers and professors in mechanical engineering, industrial and systems engineering, operations research and management science, and business. It is also an informative reference for industrial managers looking to improve the efficiency and effectiveness of their production systems.