基于需求学习的双源库存系统的定制基数激增策略

Operations Research eJournal Pub Date : 2019-09-19 DOI:10.2139/ssrn.3456834

Boxiao Chen, Cong Shi

{"title":"基于需求学习的双源库存系统的定制基数激增策略","authors":"Boxiao Chen, Cong Shi","doi":"10.2139/ssrn.3456834","DOIUrl":null,"url":null,"abstract":"We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.","PeriodicalId":275253,"journal":{"name":"Operations Research eJournal","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning\",\"authors\":\"Boxiao Chen, Cong Shi\",\"doi\":\"10.2139/ssrn.3456834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.\",\"PeriodicalId\":275253,\"journal\":{\"name\":\"Operations Research eJournal\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Operations Research eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3456834\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operations Research eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3456834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

我们考虑了一个定期审查的双源库存系统，其中加急供应商更快，成本更高，而常规供应商更慢，成本更低。在充分需求分布信息下，最优策略是极其复杂的，但著名的TBS (Tailored Base-Surge)策略具有接近最优的性能。在这种策略下，在每个期间向常规来源下一个恒定的订单，而在加急来源下的订单遵循一个简单的订单规则。在本文中，我们假设企业不知道先验的需求分布，并且仅根据过去的销售(又称删减需求)数据在每个时期做出自适应的库存订购决策。标准的性能度量是遗憾，这是一个可行的学习算法和千里眼(全信息)基准之间的成本差异。当选择基准作为(全信息)最优定制基涌策略时，我们开发了第一个非参数学习算法，该算法承认0 (T^{1/2} (log T)^{3} logt)的遗憾界，该算法被证明紧到一个对数因子。利用这个问题的结构，我们的方法结合了二分搜索和随机梯度下降的力量，还涉及到我们和千里眼最优系统动力学之间微妙的高概率耦合论证。我们还开发了一些独立感兴趣的技术结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning

We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Operations Research eJournal

自引率

0.00%

发文量