{"title":"Risk-averse supply chain management via robust reinforcement learning","authors":"Jing Wang , Christopher L.E. Swartz , Kai Huang","doi":"10.1016/j.compchemeng.2024.108912","DOIUrl":null,"url":null,"abstract":"<div><div>Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, <span><math><mover><mrow><mi>Q</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></math></span>-learning and <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning, are examined against conventional <span><math><mi>Q</mi></math></span>-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that <span><math><mi>Q</mi></math></span>-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"192 ","pages":"Article 108912"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424003302","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, -learning and -pessimistic -learning, are examined against conventional -learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that -learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned -pessimistic -learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.