{"title":"一般约束下安全线性强盗的乐观算法","authors":"Spencer Hutchinson;Arghavan Zibaie;Ramtin Pedarsani;Mahnoosh Alizadeh","doi":"10.1109/OJCSYS.2025.3558118","DOIUrl":null,"url":null,"abstract":"The stochastic linear bandit problem has emerged as a fundamental building-block in machine learning and control, and a realistic model for many applications. By equipping this classical problem with safety constraints, the <italic>safe linear bandit problem</i> further broadens its relevance to safety-critical applications. However, most existing algorithms for safe linear bandits only consider <italic>linear constraints</i>, making them inadequate for many real-world applications, which often have non-linear constraints. To alleviate this limitation, we study the problem of safe linear bandits under general (non-linear) constraints. Under a novel constraint regularity condition that is weaker than convexity, we give two algorithms with <inline-formula><tex-math>$\\tilde{\\mathcal {O}}(d \\sqrt{T})$</tex-math></inline-formula> regret. We then give efficient implementations of these algorithms for several specific settings. Lastly, we give simulation results demonstrating the effectiveness of our algorithms in choosing dynamic pricing signals for a demand response problem under distribution power flow constraints.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"103-116"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10950393","citationCount":"0","resultStr":"{\"title\":\"Optimistic Algorithms for Safe Linear Bandits Under General Constraints\",\"authors\":\"Spencer Hutchinson;Arghavan Zibaie;Ramtin Pedarsani;Mahnoosh Alizadeh\",\"doi\":\"10.1109/OJCSYS.2025.3558118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The stochastic linear bandit problem has emerged as a fundamental building-block in machine learning and control, and a realistic model for many applications. By equipping this classical problem with safety constraints, the <italic>safe linear bandit problem</i> further broadens its relevance to safety-critical applications. However, most existing algorithms for safe linear bandits only consider <italic>linear constraints</i>, making them inadequate for many real-world applications, which often have non-linear constraints. To alleviate this limitation, we study the problem of safe linear bandits under general (non-linear) constraints. Under a novel constraint regularity condition that is weaker than convexity, we give two algorithms with <inline-formula><tex-math>$\\\\tilde{\\\\mathcal {O}}(d \\\\sqrt{T})$</tex-math></inline-formula> regret. We then give efficient implementations of these algorithms for several specific settings. Lastly, we give simulation results demonstrating the effectiveness of our algorithms in choosing dynamic pricing signals for a demand response problem under distribution power flow constraints.\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"4 \",\"pages\":\"103-116\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10950393\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10950393/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10950393/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimistic Algorithms for Safe Linear Bandits Under General Constraints
The stochastic linear bandit problem has emerged as a fundamental building-block in machine learning and control, and a realistic model for many applications. By equipping this classical problem with safety constraints, the safe linear bandit problem further broadens its relevance to safety-critical applications. However, most existing algorithms for safe linear bandits only consider linear constraints, making them inadequate for many real-world applications, which often have non-linear constraints. To alleviate this limitation, we study the problem of safe linear bandits under general (non-linear) constraints. Under a novel constraint regularity condition that is weaker than convexity, we give two algorithms with $\tilde{\mathcal {O}}(d \sqrt{T})$ regret. We then give efficient implementations of these algorithms for several specific settings. Lastly, we give simulation results demonstrating the effectiveness of our algorithms in choosing dynamic pricing signals for a demand response problem under distribution power flow constraints.