{"title":"基于需求学习的双源库存系统的定制基数激增策略","authors":"Boxiao Chen, Cong Shi","doi":"10.2139/ssrn.3456834","DOIUrl":null,"url":null,"abstract":"We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.","PeriodicalId":275253,"journal":{"name":"Operations Research eJournal","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning\",\"authors\":\"Boxiao Chen, Cong Shi\",\"doi\":\"10.2139/ssrn.3456834\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.\",\"PeriodicalId\":275253,\"journal\":{\"name\":\"Operations Research eJournal\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Operations Research eJournal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3456834\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Operations Research eJournal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3456834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Tailored Base-Surge Policies in Dual-Sourcing Inventory Systems with Demand Learning
We consider a periodic-review dual-sourcing inventory system, in which the expedited supplier is faster and more costly, while the regular supplier is slower and cheaper. Under full demand distributional information, it is well-known that the optimal policy is extremely complex but the celebrated Tailored Base-Surge (TBS) policy performs near optimally. Under such a policy, a constant order is placed at the regular source in each period, while the order placed at the expedited source follows a simple order-up-to rule. In this paper, we assume that the firm does not know the demand distribution a priori, and makes adaptive inventory ordering decisions in each period based only on the past sales (a.k.a. censored demand) data. The standard performance measure is regret, which is the cost difference between a feasible learning algorithm and the clairvoyant (full-information) benchmark. When the benchmark is chosen to be the (full-information) optimal Tailored Base-Surge policy, we develop the first nonparametric learning algorithm that admits a regret bound of O(T^{1/2} (log T)^{3} loglog T), which is provably tight up to a logarithmic factor. Leveraging the structure of this problem, our approach combines the power of bisection search and stochastic gradient descent and also involves a delicate high probability coupling argument between our and the clairvoyant optimal system dynamics. We also develop several technical results that are of independent interest.