{"title":"Temporal Fairness in Learning and Earning: Price Protection Guarantee and Phase Transitions","authors":"Qing Feng, Ruihao Zhu, Stefanus Jasin","doi":"10.1145/3580507.3597668","DOIUrl":null,"url":null,"abstract":"Motivated by the prevalence of \"price protection guarantee\", which helps to promote temporal fairness in dynamic pricing, we study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing with initially unknown customer demand. Under the price protection guarantee, a customer who purchased a product in the past can receive a refund from the seller during the so-called price protection period (typically defined as a certain time window after the purchase date) in case the seller decides to lower the price. We consider a setting where a firm sells a product over a horizon of T time steps. For this setting, we characterize how the value of M, the length of price protection period, can affect the optimal regret of the learning process. Our contributions can be summarized as follows: • Inadequacy of Existing Algoirthms: We demonstrate that directly applying conventional dynamic pricing algorithms, such Upper Confidence Bound (UCB) algorithm and Thompson Sampling (TS) algorithm, may incur linear regret in the presence of price protection. We use both theoretical and numerical evidences to support this claim; • Regret Lower and Upper Bounds: We show that the optimal regret is [EQUATION] by first establishing a fundamental impossible regime with the novel refund-aware regret lower bound analysis. Then, we propose LEAP, a phased exploration type algorithm for Learning and EArning under Price Protection to match this lower bound up to logarithmic factors or even doubly logarithmic factors (when there are only two prices available to the seller); • Phase Transitions of Optimal Regret: Our results reveal the surprising phase transitions of the optimal regret with respect to M. Specifically, when M is not too large, the optimal regret has no major difference when compared to that of the classic setting with no price protection guarantee. We also show that there exists an upper limit on how much the optimal regret can deteriorate when M grows large; • Numerical Simulations: Finally, we conduct extensive numerical experiments to show the benefit of LEAP over other heuristic methods for this problem.","PeriodicalId":210555,"journal":{"name":"Proceedings of the 24th ACM Conference on Economics and Computation","volume":"129 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3580507.3597668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Motivated by the prevalence of "price protection guarantee", which helps to promote temporal fairness in dynamic pricing, we study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing with initially unknown customer demand. Under the price protection guarantee, a customer who purchased a product in the past can receive a refund from the seller during the so-called price protection period (typically defined as a certain time window after the purchase date) in case the seller decides to lower the price. We consider a setting where a firm sells a product over a horizon of T time steps. For this setting, we characterize how the value of M, the length of price protection period, can affect the optimal regret of the learning process. Our contributions can be summarized as follows: • Inadequacy of Existing Algoirthms: We demonstrate that directly applying conventional dynamic pricing algorithms, such Upper Confidence Bound (UCB) algorithm and Thompson Sampling (TS) algorithm, may incur linear regret in the presence of price protection. We use both theoretical and numerical evidences to support this claim; • Regret Lower and Upper Bounds: We show that the optimal regret is [EQUATION] by first establishing a fundamental impossible regime with the novel refund-aware regret lower bound analysis. Then, we propose LEAP, a phased exploration type algorithm for Learning and EArning under Price Protection to match this lower bound up to logarithmic factors or even doubly logarithmic factors (when there are only two prices available to the seller); • Phase Transitions of Optimal Regret: Our results reveal the surprising phase transitions of the optimal regret with respect to M. Specifically, when M is not too large, the optimal regret has no major difference when compared to that of the classic setting with no price protection guarantee. We also show that there exists an upper limit on how much the optimal regret can deteriorate when M grows large; • Numerical Simulations: Finally, we conduct extensive numerical experiments to show the benefit of LEAP over other heuristic methods for this problem.