Temporal Fairness in Learning and Earning: Price Protection Guarantee and Phase Transitions

Proceedings of the 24th ACM Conference on Economics and Computation Pub Date : 2023-07-07 DOI:10.1145/3580507.3597668

Qing Feng, Ruihao Zhu, Stefanus Jasin

{"title":"Temporal Fairness in Learning and Earning: Price Protection Guarantee and Phase Transitions","authors":"Qing Feng, Ruihao Zhu, Stefanus Jasin","doi":"10.1145/3580507.3597668","DOIUrl":null,"url":null,"abstract":"Motivated by the prevalence of \"price protection guarantee\", which helps to promote temporal fairness in dynamic pricing, we study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing with initially unknown customer demand. Under the price protection guarantee, a customer who purchased a product in the past can receive a refund from the seller during the so-called price protection period (typically defined as a certain time window after the purchase date) in case the seller decides to lower the price. We consider a setting where a firm sells a product over a horizon of T time steps. For this setting, we characterize how the value of M, the length of price protection period, can affect the optimal regret of the learning process. Our contributions can be summarized as follows: • Inadequacy of Existing Algoirthms: We demonstrate that directly applying conventional dynamic pricing algorithms, such Upper Confidence Bound (UCB) algorithm and Thompson Sampling (TS) algorithm, may incur linear regret in the presence of price protection. We use both theoretical and numerical evidences to support this claim; • Regret Lower and Upper Bounds: We show that the optimal regret is [EQUATION] by first establishing a fundamental impossible regime with the novel refund-aware regret lower bound analysis. Then, we propose LEAP, a phased exploration type algorithm for Learning and EArning under Price Protection to match this lower bound up to logarithmic factors or even doubly logarithmic factors (when there are only two prices available to the seller); • Phase Transitions of Optimal Regret: Our results reveal the surprising phase transitions of the optimal regret with respect to M. Specifically, when M is not too large, the optimal regret has no major difference when compared to that of the classic setting with no price protection guarantee. We also show that there exists an upper limit on how much the optimal regret can deteriorate when M grows large; • Numerical Simulations: Finally, we conduct extensive numerical experiments to show the benefit of LEAP over other heuristic methods for this problem.","PeriodicalId":210555,"journal":{"name":"Proceedings of the 24th ACM Conference on Economics and Computation","volume":"129 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM Conference on Economics and Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3580507.3597668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Motivated by the prevalence of "price protection guarantee", which helps to promote temporal fairness in dynamic pricing, we study the impact of such policy on the design of online learning algorithm for data-driven dynamic pricing with initially unknown customer demand. Under the price protection guarantee, a customer who purchased a product in the past can receive a refund from the seller during the so-called price protection period (typically defined as a certain time window after the purchase date) in case the seller decides to lower the price. We consider a setting where a firm sells a product over a horizon of T time steps. For this setting, we characterize how the value of M, the length of price protection period, can affect the optimal regret of the learning process. Our contributions can be summarized as follows: • Inadequacy of Existing Algoirthms: We demonstrate that directly applying conventional dynamic pricing algorithms, such Upper Confidence Bound (UCB) algorithm and Thompson Sampling (TS) algorithm, may incur linear regret in the presence of price protection. We use both theoretical and numerical evidences to support this claim; • Regret Lower and Upper Bounds: We show that the optimal regret is [EQUATION] by first establishing a fundamental impossible regime with the novel refund-aware regret lower bound analysis. Then, we propose LEAP, a phased exploration type algorithm for Learning and EArning under Price Protection to match this lower bound up to logarithmic factors or even doubly logarithmic factors (when there are only two prices available to the seller); • Phase Transitions of Optimal Regret: Our results reveal the surprising phase transitions of the optimal regret with respect to M. Specifically, when M is not too large, the optimal regret has no major difference when compared to that of the classic setting with no price protection guarantee. We also show that there exists an upper limit on how much the optimal regret can deteriorate when M grows large; • Numerical Simulations: Finally, we conduct extensive numerical experiments to show the benefit of LEAP over other heuristic methods for this problem.

查看原文本刊更多论文

学习与收入的时间公平性:价格保护保障与相变

由于“价格保护保证”的盛行有助于促进动态定价中的时间公平性，我们研究了这种政策对初始客户需求未知的数据驱动动态定价在线学习算法设计的影响。在价格保护保证下，在所谓的价格保护期(通常定义为购买日期之后的某个时间窗口)内，如果卖家决定降低价格，过去购买产品的客户可以从卖家那里获得退款。我们考虑一个设定，其中一家公司在T个时间步长的范围内销售产品。对于这种设置，我们描述了价格保护期长度M的值如何影响学习过程的最优后悔。•现有算法的不足:我们证明了直接应用传统的动态定价算法，如上置信界限(UCB)算法和汤普森抽样(TS)算法，在存在价格保护的情况下可能会导致线性后悔。我们使用理论和数字证据来支持这一说法;•后悔下界和上界:我们首先用新颖的退款意识后悔下界分析建立了一个基本的不可能制度，证明了最优后悔是[方程]。然后，我们提出了LEAP，一种用于价格保护下学习和学习的阶段探索型算法，以将该下界匹配到对数因子甚至双对数因子(当卖方只有两个价格可用时);•最优后悔的相变:我们的结果揭示了最优后悔相对于M的令人惊讶的相变。具体来说，当M不太大时，最优后悔与没有价格保护保证的经典设置相比没有重大差异。我们还发现，当M变大时，最优后悔的退化程度存在一个上限;•数值模拟:最后，我们进行了大量的数值实验，以显示LEAP比其他启发式方法在这个问题上的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 24th ACM Conference on Economics and Computation

自引率

0.00%

发文量