Risk-averse supply chain management via robust reinforcement learning

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Jing Wang , Christopher L.E. Swartz , Kai Huang
{"title":"Risk-averse supply chain management via robust reinforcement learning","authors":"Jing Wang ,&nbsp;Christopher L.E. Swartz ,&nbsp;Kai Huang","doi":"10.1016/j.compchemeng.2024.108912","DOIUrl":null,"url":null,"abstract":"<div><div>Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, <span><math><mover><mrow><mi>Q</mi></mrow><mrow><mo>ˆ</mo></mrow></mover></math></span>-learning and <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning, are examined against conventional <span><math><mi>Q</mi></math></span>-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that <span><math><mi>Q</mi></math></span>-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned <span><math><mi>β</mi></math></span>-pessimistic <span><math><mi>Q</mi></math></span>-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"192 ","pages":"Article 108912"},"PeriodicalIF":3.9000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135424003302","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, Qˆ-learning and β-pessimistic Q-learning, are examined against conventional Q-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that Q-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned β-pessimistic Q-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.
通过稳健强化学习规避风险的供应链管理
当环境偏离训练条件时,经典强化学习(RL)的性能可能会下降,从而限制了其在规避风险的供应链管理中的应用。这项研究探索了在供应链运作中使用鲁棒强化学习来对冲环境的不一致性和变化。针对传统的 Q-learning 和基线订单到库存策略,研究了两种鲁棒 RL 算法,即 Q-learning 和 β-pessimistic Q-learning。此外,这项工作还将 RL 应用从前瞻性供应链扩展到闭环供应链。使用基于代理建模开发的供应链模拟器进行了两个案例研究。研究结果表明,Q-learning 在正常条件下的表现优于基准策略,但在环境偏差的情况下,Q-learning 的性能会明显下降。相比之下,稳健的 RL 模型倾向于做出更保守的库存决策,以避免出现较大的短缺惩罚。具体来说,经过微调的 β-悲观 Q-learning 在正常条件下可以获得良好的性能,并能在中等程度的环境不一致情况下保持稳健性,因此适用于规避风险的决策。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信