有病人的情况下(部分)循环定价策略的在线学习与优化

ERN: Consumption Pub Date : 2018-03-19 DOI:10.2139/ssrn.3144018

Huanan Zhang, Stefanus Jasin

{"title":"有病人的情况下(部分)循环定价策略的在线学习与优化","authors":"Huanan Zhang, Stefanus Jasin","doi":"10.2139/ssrn.3144018","DOIUrl":null,"url":null,"abstract":"Problem definition: We consider the problem of joint learning and optimization of cyclic pricing policies in the presence of patient customers. In our problem, some customers are patient, and they are willing to wait in the system for several periods to make a purchase until the price is lower than their valuation. The seller does not know the joint distribution of customers’ valuation and patience level a priori and can only learn this from the realized total sales in every period. Academic/practical relevance: The revenue management problem with patient customers has been studied in the literature as an optimization problem, and cyclic policy has been shown to be optimal in some cases. We contribute to the literature by studying this problem from the joint learning and optimization perspective. Indeed, to the best of our knowledge, our paper is the first work that studies online learning and optimization for multiperiod pricing with patient customers. Methodology: We introduce new dynamic programming formulations for this problem, and we develop two nontrivial upper confidence bound–based learning algorithms. Results: We analyze both decreasing cyclic policies and so-called threshold-regulated policies, which contain both the decreasing cyclic policies and the nested decreasing cyclic policies. We show that our learning algorithms for these policies converge to the optimal clairvoyant decreasing cyclic policy and threshold-regulated policy at a near-optimal rate. Managerial implications: Our proposed algorithms perform significantly better than benchmark algorithms that either ignore the patient customer characteristic or simply use the standard estimate-then-optimize framework, which does not encourage enough exploration; this highlights the importance of “smart learning” in the context of data-driven decision making. In addition, our numerical results also show that combining our algorithms with smart estimation methods, such as linear interpolation or least square estimation, can significantly improve their empirical performance; this highlights the benefit of combining smart learning with smart estimation, which further increases the practical viability of the algorithms.","PeriodicalId":431230,"journal":{"name":"ERN: Consumption","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Online Learning and Optimization of (Some) Cyclic Pricing Policies in the Presence of Patient Customers\",\"authors\":\"Huanan Zhang, Stefanus Jasin\",\"doi\":\"10.2139/ssrn.3144018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Problem definition: We consider the problem of joint learning and optimization of cyclic pricing policies in the presence of patient customers. In our problem, some customers are patient, and they are willing to wait in the system for several periods to make a purchase until the price is lower than their valuation. The seller does not know the joint distribution of customers’ valuation and patience level a priori and can only learn this from the realized total sales in every period. Academic/practical relevance: The revenue management problem with patient customers has been studied in the literature as an optimization problem, and cyclic policy has been shown to be optimal in some cases. We contribute to the literature by studying this problem from the joint learning and optimization perspective. Indeed, to the best of our knowledge, our paper is the first work that studies online learning and optimization for multiperiod pricing with patient customers. Methodology: We introduce new dynamic programming formulations for this problem, and we develop two nontrivial upper confidence bound–based learning algorithms. Results: We analyze both decreasing cyclic policies and so-called threshold-regulated policies, which contain both the decreasing cyclic policies and the nested decreasing cyclic policies. We show that our learning algorithms for these policies converge to the optimal clairvoyant decreasing cyclic policy and threshold-regulated policy at a near-optimal rate. Managerial implications: Our proposed algorithms perform significantly better than benchmark algorithms that either ignore the patient customer characteristic or simply use the standard estimate-then-optimize framework, which does not encourage enough exploration; this highlights the importance of “smart learning” in the context of data-driven decision making. In addition, our numerical results also show that combining our algorithms with smart estimation methods, such as linear interpolation or least square estimation, can significantly improve their empirical performance; this highlights the benefit of combining smart learning with smart estimation, which further increases the practical viability of the algorithms.\",\"PeriodicalId\":431230,\"journal\":{\"name\":\"ERN: Consumption\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ERN: Consumption\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3144018\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ERN: Consumption","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3144018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

问题定义:我们考虑在患者客户存在的情况下，循环定价策略的联合学习和优化问题。在我们的问题中，一些客户很有耐心，他们愿意在系统中等待几个周期，直到价格低于他们的估值才进行购买。卖方先验地不知道顾客的评价和耐心水平的联合分布，只能从每个时期的实现总销售额中得知。学术/实践相关性:与患者客户的收入管理问题已在文献中作为优化问题进行了研究，并且循环策略已被证明在某些情况下是最优的。我们从联合学习和优化的角度来研究这个问题，为文献做出贡献。事实上，据我们所知，我们的论文是第一个研究在线学习和优化与耐心客户的多期定价的工作。方法:针对这一问题引入了新的动态规划公式，并开发了两种非平凡的基于上置信度界的学习算法。结果:我们分析了递减循环策略和所谓的阈值调节策略，其中包括递减循环策略和嵌套递减循环策略。我们证明了这些策略的学习算法以接近最优的速率收敛到最优的洞察力递减循环策略和阈值调节策略。管理意义:我们提出的算法表现明显优于基准算法，基准算法要么忽略耐心的客户特征，要么简单地使用标准的估计-然后优化框架，这没有鼓励足够的探索;这凸显了“智能学习”在数据驱动型决策背景下的重要性。此外，我们的数值结果还表明，将我们的算法与智能估计方法(如线性插值或最小二乘估计)相结合，可以显着提高其经验性能;这突出了将智能学习与智能估计相结合的好处，这进一步提高了算法的实际可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Online Learning and Optimization of (Some) Cyclic Pricing Policies in the Presence of Patient Customers

Problem definition: We consider the problem of joint learning and optimization of cyclic pricing policies in the presence of patient customers. In our problem, some customers are patient, and they are willing to wait in the system for several periods to make a purchase until the price is lower than their valuation. The seller does not know the joint distribution of customers’ valuation and patience level a priori and can only learn this from the realized total sales in every period. Academic/practical relevance: The revenue management problem with patient customers has been studied in the literature as an optimization problem, and cyclic policy has been shown to be optimal in some cases. We contribute to the literature by studying this problem from the joint learning and optimization perspective. Indeed, to the best of our knowledge, our paper is the first work that studies online learning and optimization for multiperiod pricing with patient customers. Methodology: We introduce new dynamic programming formulations for this problem, and we develop two nontrivial upper confidence bound–based learning algorithms. Results: We analyze both decreasing cyclic policies and so-called threshold-regulated policies, which contain both the decreasing cyclic policies and the nested decreasing cyclic policies. We show that our learning algorithms for these policies converge to the optimal clairvoyant decreasing cyclic policy and threshold-regulated policy at a near-optimal rate. Managerial implications: Our proposed algorithms perform significantly better than benchmark algorithms that either ignore the patient customer characteristic or simply use the standard estimate-then-optimize framework, which does not encourage enough exploration; this highlights the importance of “smart learning” in the context of data-driven decision making. In addition, our numerical results also show that combining our algorithms with smart estimation methods, such as linear interpolation or least square estimation, can significantly improve their empirical performance; this highlights the benefit of combining smart learning with smart estimation, which further increases the practical viability of the algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ERN: Consumption

自引率

0.00%

发文量