约束在线凸优化中的动态遗憾和约束违背

2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt) Pub Date : 2022-09-19 DOI:10.23919/WiOpt56218.2022.9930613

R. Vaze

{"title":"约束在线凸优化中的动态遗憾和约束违背","authors":"R. Vaze","doi":"10.23919/WiOpt56218.2022.9930613","DOIUrl":null,"url":null,"abstract":"A constrained version of the online convex optimization (OCO) problem is considered. With slotted time, for each slot, first an action is chosen. Subsequently the loss function and the constraint violation penalty evaluated at the chosen action point is revealed. For each slot, both the loss function as well as the function defining the constraint set is assumed to be smooth and strongly convex. In addition, once an action is chosen, local information about a feasible set within a small neighborhood of the current action is also revealed. An algorithm is allowed to compute at most one gradient at its point of choice given the described feedback to choose the next action. The goal of an algorithm is to simultaneously minimize the dynamic regret (loss incurred compared to the oracle’s loss) and the constraint violation penalty (penalty accrued compared to the oracle’s penalty). We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action. We show that both the dynamic regret and the constraint violation is order-wise bounded by the path-length, the sum of the distances between the consecutive optimal actions. Moreover, we show that the derived bounds are the best possible.","PeriodicalId":228040,"journal":{"name":"2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization\",\"authors\":\"R. Vaze\",\"doi\":\"10.23919/WiOpt56218.2022.9930613\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A constrained version of the online convex optimization (OCO) problem is considered. With slotted time, for each slot, first an action is chosen. Subsequently the loss function and the constraint violation penalty evaluated at the chosen action point is revealed. For each slot, both the loss function as well as the function defining the constraint set is assumed to be smooth and strongly convex. In addition, once an action is chosen, local information about a feasible set within a small neighborhood of the current action is also revealed. An algorithm is allowed to compute at most one gradient at its point of choice given the described feedback to choose the next action. The goal of an algorithm is to simultaneously minimize the dynamic regret (loss incurred compared to the oracle’s loss) and the constraint violation penalty (penalty accrued compared to the oracle’s penalty). We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action. We show that both the dynamic regret and the constraint violation is order-wise bounded by the path-length, the sum of the distances between the consecutive optimal actions. Moreover, we show that the derived bounds are the best possible.\",\"PeriodicalId\":228040,\"journal\":{\"name\":\"2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/WiOpt56218.2022.9930613\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/WiOpt56218.2022.9930613","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

考虑了在线凸优化(OCO)问题的约束版本。对于槽时间，对于每个槽，首先选择一个动作。然后给出了所选动作点处的损失函数和约束违反惩罚。对于每个槽，假设损失函数和定义约束集的函数都是光滑且强凸的。此外，一旦选择了一个动作，还会显示当前动作的小邻域内可行集的局部信息。给定所描述的选择下一个动作的反馈，算法允许在其选择点上最多计算一个梯度。算法的目标是同时最小化动态后悔(发生的损失与oracle的损失相比)和约束违反惩罚(累积的惩罚与oracle的惩罚相比)。我们提出了一种算法，该算法在当前动作周围适当选择的集合上遵循投影梯度下降。我们证明了动态后悔和约束违反都是由路径长度(连续最优动作之间的距离之和)有序限定的。此外，我们还证明了所导出的界是最好的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On Dynamic Regret and Constraint Violations in Constrained Online Convex Optimization

A constrained version of the online convex optimization (OCO) problem is considered. With slotted time, for each slot, first an action is chosen. Subsequently the loss function and the constraint violation penalty evaluated at the chosen action point is revealed. For each slot, both the loss function as well as the function defining the constraint set is assumed to be smooth and strongly convex. In addition, once an action is chosen, local information about a feasible set within a small neighborhood of the current action is also revealed. An algorithm is allowed to compute at most one gradient at its point of choice given the described feedback to choose the next action. The goal of an algorithm is to simultaneously minimize the dynamic regret (loss incurred compared to the oracle’s loss) and the constraint violation penalty (penalty accrued compared to the oracle’s penalty). We propose an algorithm that follows projected gradient descent over a suitably chosen set around the current action. We show that both the dynamic regret and the constraint violation is order-wise bounded by the path-length, the sum of the distances between the consecutive optimal actions. Moreover, we show that the derived bounds are the best possible.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt)

自引率

0.00%

发文量