{"title":"论从审查需求中学习的难度","authors":"G. Lugosi, Mihalis G. Markakis, Gergely Neu","doi":"10.2139/ssrn.3509255","DOIUrl":null,"url":null,"abstract":"Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously \"explore\" and \"exploit\" with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance.<br><br>Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be \"learned\" only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster.<br><br>Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product.<br><br>Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.","PeriodicalId":200007,"journal":{"name":"ERN: Statistical Decision Theory; Operations Research (Topic)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"On the Hardness of Learning from Censored Demand\",\"authors\":\"G. Lugosi, Mihalis G. Markakis, Gergely Neu\",\"doi\":\"10.2139/ssrn.3509255\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously \\\"explore\\\" and \\\"exploit\\\" with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance.<br><br>Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be \\\"learned\\\" only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster.<br><br>Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product.<br><br>Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.\",\"PeriodicalId\":200007,\"journal\":{\"name\":\"ERN: Statistical Decision Theory; Operations Research (Topic)\",\"volume\":\"79 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Statistical Decision Theory; Operations Research (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3509255\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Statistical Decision Theory; Operations Research (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3509255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Problem definition: We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored data. The manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost that the firm incurs. We study the hardness of the problem disentangled from any probabilistic assumptions on the demand, and we develop inventory control policies with guaranteed performance.
Academic/practical relevance: The problem is motivated by multi-period inventory management of perishable goods, such as newspapers, fresh food, or certain pharmaceutical products, where demand needs to be "learned" only through sales. Demand for many goods is non-stationary, e.g., exhibiting trends and/or seasonalities, yet existing literature offers policies that are tailored to, or facilitated by time stationarity. Methodology: We adopt the regret criterion for performance evaluation purposes. By combining concepts and results from partial monitoring, we couple a carefully designed cost estimator to the well-known ExponentiallyWeighted Forecaster.
Results: We develop a simple and easy-to-interpret policy that achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to both the number of time periods and available actions. We demonstrate the flexibility of our approach by extending these performance guarantees to: (i) tracking regret, a powerful notion of regret that uses a large class of non-stationary action sequences as benchmark; (ii) single-warehouse multi-retailer inventory management of a perishable product.
Managerial implications: Our results lead to two important insights: the benefit from “information stalking” as well as the cost of censoring are insignificant in this setting; paving the way for the design of applicable heuristic policies. Further supported by numerical experiments, our findings illustrate the performance loss that can be incurred when policies that are designed under stationarity assumptions are applied to non-stationary environments.