Online Learning for Dual-Index Policies in Dual-Sourcing Systems

Jingwen Tang, Boxiao Chen, Cong Shi
{"title":"Online Learning for Dual-Index Policies in Dual-Sourcing Systems","authors":"Jingwen Tang, Boxiao Chen, Cong Shi","doi":"10.1287/msom.2022.0323","DOIUrl":null,"url":null,"abstract":"Problem definition: We consider a periodic-review dual-sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well known in the literature that the optimal inventory replenishment policy is complex and state dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. Methodology/results: The performance measure is the regret, defined as the cost difference of any feasible learning algorithm against the full-information optimal dual-index policy. We develop a nonparametric online learning algorithm that admits a regret upper bound of [Formula: see text], which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Managerial implications: Our work provides practitioners with an easy-to-implement, robust, and provably good online decision support system for managing a dual-sourcing inventory system.Funding: This work was supported by the Amazon Research Award.Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0323 .","PeriodicalId":501267,"journal":{"name":"Manufacturing & Service Operations Management","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Manufacturing & Service Operations Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/msom.2022.0323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Problem definition: We consider a periodic-review dual-sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well known in the literature that the optimal inventory replenishment policy is complex and state dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. Methodology/results: The performance measure is the regret, defined as the cost difference of any feasible learning algorithm against the full-information optimal dual-index policy. We develop a nonparametric online learning algorithm that admits a regret upper bound of [Formula: see text], which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Managerial implications: Our work provides practitioners with an easy-to-implement, robust, and provably good online decision support system for managing a dual-sourcing inventory system.Funding: This work was supported by the Amazon Research Award.Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0323 .
双源系统中双指数政策的在线学习
问题定义:我们考虑的是在结转供应和需求积压的情况下,一个周期性回顾的双源库存系统,其中有一个常规源(单位成本较低,但交货期较长)和一个加速源(交货期较短,但单位成本较高)。与现有文献不同的是,我们假定企业无法事先获得需求分布,只能依赖过去的需求实现情况。众所周知,即使拥有关于需求分布的完整信息,最优库存补充政策也是复杂的,而且与状态有关。因此,我们将注意力集中在一类流行、易于实施且接近最优的启发式政策上,即双指数政策。方法/结果:性能的衡量标准是 "遗憾",它被定义为任何可行的学习算法与全信息最优双指数策略的成本差异。我们开发了一种非参数在线学习算法,该算法的遗憾值上限为[公式:见正文],与任何可行学习算法的遗憾值下限相差一个对数因子。我们的算法以创新的方式整合了随机匪帮和样本平均逼近技术。作为遗憾分析的一部分,我们明确证明了底层马尔科夫链是遍历的,并通过耦合论证以指数级速度收敛到其稳态,这可能会引起独立的兴趣。管理意义:我们的工作为从业人员提供了一个易于实施、稳健且可证明良好的在线决策支持系统,用于管理双源库存系统:这项工作得到了亚马逊研究奖的支持:在线附录见 https://doi.org/10.1287/msom.2022.0323 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信