Optimistic Algorithms for Safe Linear Bandits Under General Constraints

IEEE open journal of control systems Pub Date : 2025-04-07 DOI:10.1109/OJCSYS.2025.3558118

Spencer Hutchinson;Arghavan Zibaie;Ramtin Pedarsani;Mahnoosh Alizadeh

{"title":"Optimistic Algorithms for Safe Linear Bandits Under General Constraints","authors":"Spencer Hutchinson;Arghavan Zibaie;Ramtin Pedarsani;Mahnoosh Alizadeh","doi":"10.1109/OJCSYS.2025.3558118","DOIUrl":null,"url":null,"abstract":"The stochastic linear bandit problem has emerged as a fundamental building-block in machine learning and control, and a realistic model for many applications. By equipping this classical problem with safety constraints, the <italic>safe linear bandit problem</i> further broadens its relevance to safety-critical applications. However, most existing algorithms for safe linear bandits only consider <italic>linear constraints</i>, making them inadequate for many real-world applications, which often have non-linear constraints. To alleviate this limitation, we study the problem of safe linear bandits under general (non-linear) constraints. Under a novel constraint regularity condition that is weaker than convexity, we give two algorithms with <inline-formula><tex-math>$\\tilde{\\mathcal {O}}(d \\sqrt{T})$</tex-math></inline-formula> regret. We then give efficient implementations of these algorithms for several specific settings. Lastly, we give simulation results demonstrating the effectiveness of our algorithms in choosing dynamic pricing signals for a demand response problem under distribution power flow constraints.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"4 ","pages":"103-116"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10950393","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10950393/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The stochastic linear bandit problem has emerged as a fundamental building-block in machine learning and control, and a realistic model for many applications. By equipping this classical problem with safety constraints, the safe linear bandit problem further broadens its relevance to safety-critical applications. However, most existing algorithms for safe linear bandits only consider linear constraints, making them inadequate for many real-world applications, which often have non-linear constraints. To alleviate this limitation, we study the problem of safe linear bandits under general (non-linear) constraints. Under a novel constraint regularity condition that is weaker than convexity, we give two algorithms with

$\tilde{\mathcal {O}}(d \sqrt{T})$

regret. We then give efficient implementations of these algorithms for several specific settings. Lastly, we give simulation results demonstrating the effectiveness of our algorithms in choosing dynamic pricing signals for a demand response problem under distribution power flow constraints.

查看原文本刊更多论文

一般约束下安全线性强盗的乐观算法

随机线性强盗问题已经成为机器学习和控制的基本组成部分，也是许多应用的现实模型。通过为这一经典问题配备安全约束，安全线性强盗问题进一步扩大了其与安全关键应用的相关性。然而，大多数现有的安全线性强盗算法只考虑线性约束，使得它们不适合许多具有非线性约束的实际应用。为了减轻这种限制，我们研究了一般（非线性）约束下的安全线性强盗问题。在一种比凸性更弱的约束规则条件下，给出了两种具有$\tilde{\mathcal {O}}(d \sqrt{T})$遗憾的算法。然后，我们给出了这些算法在几个特定设置下的有效实现。最后，给出了仿真结果，证明了算法在配电网潮流约束下的需求响应问题中选择动态定价信号的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE open journal of control systems

自引率

0.00%

发文量