{"title":"租赁时间分布未知的可重用资源分配的强化学习算法","authors":"Ziwei Wang, Jie Song, Yixuan Liu, Jingtong Zhao","doi":"10.1016/j.ejor.2025.09.012","DOIUrl":null,"url":null,"abstract":"We explore a scenario where a platform must decide on the price and type of reusable resources for sequentially arriving customers. The product is rented for a random period, during which the platform also extracts rewards based on a prearranged agreement. The expected reward varies during the usage time, and the platform aims to maximize revenue over a finite horizon. Two primary challenges are encountered: the stochastic usage time introduces uncertainty, affecting product availability, and the platform lacks initial knowledge about reward and usage time distributions. In contrast to conventional online learning, where usage time distributions are parametric, our problem allows for unknown distribution types. To overcome these challenges, we formulate the problem as a Markov decision process and model the usage time distribution using a hazard rate. We first introduce a greedy policy in the full-information setting with a provable 1/2-approximation ratio. We then develop a reinforcement learning algorithm to implement this policy when the parameters are unknown, allowing for non-parametric distributions and time-varying rewards. We further prove that the algorithm achieves sublinear regret against the greedy policy. Numerical experiments on synthetic data as well as a real dataset from TikTok demonstrate the effectiveness of our method.","PeriodicalId":55161,"journal":{"name":"European Journal of Operational Research","volume":"326 1","pages":""},"PeriodicalIF":6.0000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning algorithm for reusable resource allocation with unknown rental time distribution\",\"authors\":\"Ziwei Wang, Jie Song, Yixuan Liu, Jingtong Zhao\",\"doi\":\"10.1016/j.ejor.2025.09.012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We explore a scenario where a platform must decide on the price and type of reusable resources for sequentially arriving customers. The product is rented for a random period, during which the platform also extracts rewards based on a prearranged agreement. The expected reward varies during the usage time, and the platform aims to maximize revenue over a finite horizon. Two primary challenges are encountered: the stochastic usage time introduces uncertainty, affecting product availability, and the platform lacks initial knowledge about reward and usage time distributions. In contrast to conventional online learning, where usage time distributions are parametric, our problem allows for unknown distribution types. To overcome these challenges, we formulate the problem as a Markov decision process and model the usage time distribution using a hazard rate. We first introduce a greedy policy in the full-information setting with a provable 1/2-approximation ratio. We then develop a reinforcement learning algorithm to implement this policy when the parameters are unknown, allowing for non-parametric distributions and time-varying rewards. We further prove that the algorithm achieves sublinear regret against the greedy policy. Numerical experiments on synthetic data as well as a real dataset from TikTok demonstrate the effectiveness of our method.\",\"PeriodicalId\":55161,\"journal\":{\"name\":\"European Journal of Operational Research\",\"volume\":\"326 1\",\"pages\":\"\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Journal of Operational Research\",\"FirstCategoryId\":\"91\",\"ListUrlMain\":\"https://doi.org/10.1016/j.ejor.2025.09.012\",\"RegionNum\":2,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Operational Research","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1016/j.ejor.2025.09.012","RegionNum":2,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
Reinforcement learning algorithm for reusable resource allocation with unknown rental time distribution
We explore a scenario where a platform must decide on the price and type of reusable resources for sequentially arriving customers. The product is rented for a random period, during which the platform also extracts rewards based on a prearranged agreement. The expected reward varies during the usage time, and the platform aims to maximize revenue over a finite horizon. Two primary challenges are encountered: the stochastic usage time introduces uncertainty, affecting product availability, and the platform lacks initial knowledge about reward and usage time distributions. In contrast to conventional online learning, where usage time distributions are parametric, our problem allows for unknown distribution types. To overcome these challenges, we formulate the problem as a Markov decision process and model the usage time distribution using a hazard rate. We first introduce a greedy policy in the full-information setting with a provable 1/2-approximation ratio. We then develop a reinforcement learning algorithm to implement this policy when the parameters are unknown, allowing for non-parametric distributions and time-varying rewards. We further prove that the algorithm achieves sublinear regret against the greedy policy. Numerical experiments on synthetic data as well as a real dataset from TikTok demonstrate the effectiveness of our method.
期刊介绍:
The European Journal of Operational Research (EJOR) publishes high quality, original papers that contribute to the methodology of operational research (OR) and to the practice of decision making.