双源系统中双指数政策的在线学习

Manufacturing & Service Operations Management Pub Date : 2023-12-12 DOI:10.1287/msom.2022.0323

Jingwen Tang, Boxiao Chen, Cong Shi

{"title":"双源系统中双指数政策的在线学习","authors":"Jingwen Tang, Boxiao Chen, Cong Shi","doi":"10.1287/msom.2022.0323","DOIUrl":null,"url":null,"abstract":"Problem definition: We consider a periodic-review dual-sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well known in the literature that the optimal inventory replenishment policy is complex and state dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. Methodology/results: The performance measure is the regret, defined as the cost difference of any feasible learning algorithm against the full-information optimal dual-index policy. We develop a nonparametric online learning algorithm that admits a regret upper bound of [Formula: see text], which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Managerial implications: Our work provides practitioners with an easy-to-implement, robust, and provably good online decision support system for managing a dual-sourcing inventory system.Funding: This work was supported by the Amazon Research Award.Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0323 .","PeriodicalId":501267,"journal":{"name":"Manufacturing & Service Operations Management","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Online Learning for Dual-Index Policies in Dual-Sourcing Systems\",\"authors\":\"Jingwen Tang, Boxiao Chen, Cong Shi\",\"doi\":\"10.1287/msom.2022.0323\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Problem definition: We consider a periodic-review dual-sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well known in the literature that the optimal inventory replenishment policy is complex and state dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. Methodology/results: The performance measure is the regret, defined as the cost difference of any feasible learning algorithm against the full-information optimal dual-index policy. We develop a nonparametric online learning algorithm that admits a regret upper bound of [Formula: see text], which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Managerial implications: Our work provides practitioners with an easy-to-implement, robust, and provably good online decision support system for managing a dual-sourcing inventory system.Funding: This work was supported by the Amazon Research Award.Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0323 .\",\"PeriodicalId\":501267,\"journal\":{\"name\":\"Manufacturing & Service Operations Management\",\"volume\":\"68 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Manufacturing & Service Operations Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/msom.2022.0323\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Manufacturing & Service Operations Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/msom.2022.0323","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

问题定义：我们考虑的是在结转供应和需求积压的情况下，一个周期性回顾的双源库存系统，其中有一个常规源（单位成本较低，但交货期较长）和一个加速源（交货期较短，但单位成本较高）。与现有文献不同的是，我们假定企业无法事先获得需求分布，只能依赖过去的需求实现情况。众所周知，即使拥有关于需求分布的完整信息，最优库存补充政策也是复杂的，而且与状态有关。因此，我们将注意力集中在一类流行、易于实施且接近最优的启发式政策上，即双指数政策。方法/结果：性能的衡量标准是 "遗憾"，它被定义为任何可行的学习算法与全信息最优双指数策略的成本差异。我们开发了一种非参数在线学习算法，该算法的遗憾值上限为[公式：见正文]，与任何可行学习算法的遗憾值下限相差一个对数因子。我们的算法以创新的方式整合了随机匪帮和样本平均逼近技术。作为遗憾分析的一部分，我们明确证明了底层马尔科夫链是遍历的，并通过耦合论证以指数级速度收敛到其稳态，这可能会引起独立的兴趣。管理意义：我们的工作为从业人员提供了一个易于实施、稳健且可证明良好的在线决策支持系统，用于管理双源库存系统：这项工作得到了亚马逊研究奖的支持：在线附录见 https://doi.org/10.1287/msom.2022.0323 。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Online Learning for Dual-Index Policies in Dual-Sourcing Systems

Problem definition: We consider a periodic-review dual-sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well known in the literature that the optimal inventory replenishment policy is complex and state dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. Methodology/results: The performance measure is the regret, defined as the cost difference of any feasible learning algorithm against the full-information optimal dual-index policy. We develop a nonparametric online learning algorithm that admits a regret upper bound of [Formula: see text], which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Managerial implications: Our work provides practitioners with an easy-to-implement, robust, and provably good online decision support system for managing a dual-sourcing inventory system.Funding: This work was supported by the Amazon Research Award.Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0323 .

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Manufacturing & Service Operations Management

自引率

0.00%

发文量