论从审查需求中学习的难度

ERN: Statistical Decision Theory; Operations Research (Topic) Pub Date : 2021-02-04 DOI:10.2139/ssrn.3509255

G. Lugosi, Mihalis G. Markakis, Gergely Neu

{"title":"论从审查需求中学习的难度","authors":"G. Lugosi, Mihalis G. Markakis, Gergely Neu","doi":"10.2139/ssrn.3509255","DOIUrl":null,"url":null,"abstract":"Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously \"explore\" and \"exploit\" with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance. Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be \"learned\" only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster. Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product. Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.","PeriodicalId":200007,"journal":{"name":"ERN: Statistical Decision Theory; Operations Research (Topic)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"On the Hardness of Learning from Censored Demand\",\"authors\":\"G. Lugosi, Mihalis G. Markakis, Gergely Neu\",\"doi\":\"10.2139/ssrn.3509255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously \\\"explore\\\" and \\\"exploit\\\" with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance. Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be \\\"learned\\\" only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster. Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product. Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.\",\"PeriodicalId\":200007,\"journal\":{\"name\":\"ERN: Statistical Decision Theory; Operations Research (Topic)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Statistical Decision Theory; Operations Research (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3509255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Statistical Decision Theory; Operations Research (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3509255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

问题定义:我们考虑一个重复的报贩问题，其中库存管理人员没有关于需求的先验信息，并且只能访问经过审查的数据。经理需要同时“探索”和“利用”她的库存决策，以最小化公司所招致的累积成本。我们研究了从需求的任何概率假设中解脱出来的问题的硬度，并制定了保证性能的库存控制策略。学术/实践相关性:这个问题是由易腐货物的多周期库存管理引起的，例如报纸、新鲜食品或某些药品，这些货物的需求只需要通过销售来“了解”。对许多商品的需求是非平稳的，例如，表现出趋势和/或季节性，但现有文献提供了针对时间平稳性量身定制或由时间平稳性促进的政策。方法:采用后悔标准进行绩效评价。通过结合部分监测的概念和结果，我们将精心设计的成本估算器与著名的指数加权预测器相结合。结果:我们开发了一个简单且易于解释的策略，该策略在时间周期和可用操作的数量方面实现了预期后悔(高达对数因子)的最佳缩放。我们通过将这些性能保证扩展到以下方面来展示我们方法的灵活性:(i)跟踪后悔，这是一种使用大量非平稳动作序列作为基准的后悔的强大概念;(ii)易腐产品的单仓多零售商库存管理。管理启示:我们的研究结果带来了两个重要的见解:在这种情况下，“信息跟踪”的好处以及审查的成本是微不足道的;为设计适用的启发式策略铺平了道路。在数值实验的进一步支持下，我们的研究结果表明，当在平稳性假设下设计的策略应用于非平稳环境时，可能会产生性能损失。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On the Hardness of Learning from Censored Demand

Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance.

Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be "learned" only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster.

Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product.

Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ERN: Statistical Decision Theory; Operations Research (Topic)

自引率

0.00%

发文量