Rong Jin, D. Simchi-Levi, L. Wang, Xinshang Wang, Sen Yang
{"title":"缩小置信上限:城市仓库的动态产品选择问题","authors":"Rong Jin, D. Simchi-Levi, L. Wang, Xinshang Wang, Sen Yang","doi":"10.2139/ssrn.3342761","DOIUrl":null,"url":null,"abstract":"The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T-independent part and a T-dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d. Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%. This paper was accepted by J. George Shanthikumar, big data analytics.","PeriodicalId":106276,"journal":{"name":"CompSciRN: Algorithms (Topic)","volume":"70 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses\",\"authors\":\"Rong Jin, D. Simchi-Levi, L. Wang, Xinshang Wang, Sen Yang\",\"doi\":\"10.2139/ssrn.3342761\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T-independent part and a T-dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d. Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%. This paper was accepted by J. George Shanthikumar, big data analytics.\",\"PeriodicalId\":106276,\"journal\":{\"name\":\"CompSciRN: Algorithms (Topic)\",\"volume\":\"70 3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CompSciRN: Algorithms (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3342761\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CompSciRN: Algorithms (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3342761","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
最近,零售平台上的超快配送服务日益普及,推动了城市仓库的使用越来越多,因为城市仓库靠近客户,使得快速配送成为可能。城市仓库的空间限制给这类在线零售商带来了一个问题:他们所持有的库存单位(sku)数量不再是“越多越好”,但仍然可以相当大,在一个产品类别中可以达到数百或数千个。在本文中,我们研究了动态选择具有最高客户购买概率的大量产品(即sku)的算法,从潜在产品的海洋中提供给零售商的超快速交付平台。我们将产品选择问题提炼成具有线性泛化的半带模型。总共有N个手臂对应于N个产品,每个都有一个维度为d的特征向量。玩家在每个周期内拉动K个手臂,并观察每个被拉动的手臂的强盗反馈。我们关注K远大于总时间周期数T或产品特征维度d的设置。我们首先分析了标准的上置信度界(UCB)算法,并表明其遗憾界可以表示为T无关部分和T相关部分的总和,我们分别称之为“固定成本”和“可变成本”。为了降低大K值的固定成本,我们提出了一种新的在线学习算法,该算法迭代地缩小每个周期内的上置信区间,并表明其固定成本降低了d。此外,我们在阿里巴巴集团的工业数据集上对算法进行了测试。实验结果表明,新算法将标准UCB算法的总遗憾率降低了至少10%。本文被大数据分析J. George Shanthikumar接受。
Shrinking the Upper Confidence Bound: A Dynamic Product Selection Problem for Urban Warehouses
The recent rising popularity of ultrafast delivery services on retail platforms fuels the increasing use of urban warehouses, whose proximity to customers makes fast deliveries viable. The space limit in urban warehouses poses a problem for such online retailers: the number of stock keeping units (SKUs) they carry is no longer “the more, the better,” yet it can still be significantly large, reaching hundreds or thousands in a product category. In this paper, we study algorithms for dynamically selecting a large number of products (i.e., SKUs) with top customer purchase probabilities on the fly, from an ocean of potential products to offer on retailers’ ultrafast delivery platforms. We distill the product selection problem into a semibandit model with linear generalization. There are in total N arms corresponding to N products, each with a feature vector of dimension d. The player pulls K arms in each period and observes the bandit feedback from each of the pulled arms. We focus on the setting where K is much greater than the number of total time periods T or the dimension of product features d. We first analyze a standard Upper Confidence Bound (UCB) algorithm and show its regret bound can be expressed as the sum of a T-independent part and a T-dependent part, which we refer to as “fixed cost” and “variable cost,” respectively. To reduce the fixed cost for large K values, we propose a novel online learning algorithm, which iteratively shrinks the upper confidence bounds within each period, and show its fixed cost is reduced by a factor of d. Moreover, we test the algorithms on an industrial data set from Alibaba Group. Experimental results show that our new algorithm reduces the total regret of the standard UCB algorithm by at least 10%. This paper was accepted by J. George Shanthikumar, big data analytics.