单服务器队列系统中基于学习的最优接纳控制

Q1 Mathematics
Asaf Cohen, Vijay Subramanian, Yili Zhang
{"title":"单服务器队列系统中基于学习的最优接纳控制","authors":"Asaf Cohen, Vijay Subramanian, Yili Zhang","doi":"10.1287/stsy.2022.0042","DOIUrl":null,"url":null,"abstract":"We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].","PeriodicalId":36337,"journal":{"name":"Stochastic Systems","volume":"52 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning-Based Optimal Admission Control in a Single-Server Queuing System\",\"authors\":\"Asaf Cohen, Vijay Subramanian, Yili Zhang\",\"doi\":\"10.1287/stsy.2022.0042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].\",\"PeriodicalId\":36337,\"journal\":{\"name\":\"Stochastic Systems\",\"volume\":\"52 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Stochastic Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1287/stsy.2022.0042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"Mathematics\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1287/stsy.2022.0042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

摘要

我们考虑的是一个具有未知服务率和到达率的 M/M/1 排队系统中的长期平均利润最大化接纳控制问题。调度员在服务完成后收取固定奖励,并对排队等候的顾客强制执行单位时间成本,调度员根据对系统排队长度的完整历史观察,在到达时决定是否接纳到达的顾客。Naor [Naor P (1969) The regulation of queue size by levying tolls.Econometrica 37(1):15-24] 表明,如果模型的所有参数都是已知的,那么使用静态阈值策略是最优的:如果队列长度小于预定阈值,则接纳,否则不接纳。我们提出了一种基于学习的调度算法,并描述了其与 Naor [Naor P (1969) The regulation of queue size by levying tolls.经济计量学》37(1):15-24]。我们证明,当所有全信息最优阈值都不为零时,该算法的遗憾值为 O(1),而在全信息最优阈值为 0(即最优策略是拒绝所有到达者)的情况下,对于任何指定的[公式:见正文]遗憾值,其中 N 是到达者的数量,该算法的遗憾值为[公式:见正文]:A. Cohen 由美国国家科学基金会 [Grant DMS-2006305] 提供部分资助。V. Subramanian 部分获得了美国国家科学基金会 [CCF-2008130, ECCS-2038416, CNS-1955777 和 CMMI-2240981] 的资助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning-Based Optimal Admission Control in a Single-Server Queuing System
We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.Funding: A. Cohen is partially supported by the National Science Foundation [Grant DMS-2006305]. V. Subramanian is supported in part by the NSF [Grants CCF-2008130, ECCS-2038416, CNS-1955777, and CMMI-2240981].
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Stochastic Systems
Stochastic Systems Decision Sciences-Statistics, Probability and Uncertainty
CiteScore
3.70
自引率
0.00%
发文量
18
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信