Mathematics of Operations Research最新文献_第4页

Corruption-Robust Exploration in Episodic Reinforcement Learning 情节强化学习中的腐败-稳健探索

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-23 DOI: 10.1287/moor.2021.0202

Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

{"title":"Corruption-Robust Exploration in Episodic Reinforcement Learning","authors":"Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun","doi":"10.1287/moor.2021.0202","DOIUrl":"https://doi.org/10.1287/moor.2021.0202","url":null,"abstract":"We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning.Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"61 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme 带观察成本的马尔可夫决策过程：框架与惩罚方案计算

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-23 DOI: 10.1287/moor.2023.0172

Christoph Reisinger, Jonathan Tam

引用次数: 0

Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions 连续预算拍卖中无悔学习的流动福利保证

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-14 DOI: 10.1287/moor.2023.0274

Giannis Fikioris, Éva Tardos

{"title":"Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions","authors":"Giannis Fikioris, Éva Tardos","doi":"10.1287/moor.2023.0274","DOIUrl":"https://doi.org/10.1287/moor.2023.0274","url":null,"abstract":"We study the liquid welfare in sequential first-price auctions with budgeted buyers. We use a behavioral model for the buyers, assuming a learning style guarantee: the utility of each buyer is within a [Formula: see text] factor ([Formula: see text]) of the utility achievable by shading their value with the same factor at each iteration. We show a [Formula: see text] price of anarchy for liquid welfare when valuations are additive. This is in stark contrast to sequential second-price auctions, where the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare, even when [Formula: see text]. We prove a lower bound of [Formula: see text] on the liquid welfare loss under the given assumption in first-price auctions. Our liquid welfare results extend when buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of [Formula: see text] compared with the guarantee for the additive case.Funding: G. Fikioris is supported in part by the Air Force Office of Scientific Research [Grants FA9550-19-1-0183 and FA9550-23-1-0068], the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program, and the Onassis Foundation [Scholarship ID F ZS 068-1/2022-2023]. É. Tardos is supported in part by the NSF [Grant CCF-1408673] and AFOSR [Grants FA9550-19-1-0183, FA9550-23-1-0410, and FA9550-23-1-0068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"32 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Theory of Alternating Paths and Blossoms from the Perspective of Minimum Length 从最小长度的角度看交替路径和花朵理论

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-07 DOI: 10.1287/moor.2020.0388

Vijay V. Vazirani

引用次数: 0

Is There a Golden Parachute in Sannikov’s Principal–Agent Problem? 桑尼科夫的委托代理问题中有金降落伞吗？

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-06 DOI: 10.1287/moor.2022.0305

Dylan Possamaï, Nizar Touzi

{"title":"Is There a Golden Parachute in Sannikov’s Principal–Agent Problem?","authors":"Dylan Possamaï, Nizar Touzi","doi":"10.1287/moor.2022.0305","DOIUrl":"https://doi.org/10.1287/moor.2022.0305","url":null,"abstract":"This paper provides a complete review of the continuous-time optimal contracting problem introduced by Sannikov in the extended context allowing for possibly different discount rates for both parties. The agent’s problem is to seek for optimal effort given the compensation scheme proposed by the principal over a random horizon. Then, given the optimal agent’s response, the principal determines the best compensation scheme in terms of running payment, retirement, and lump-sum payment at retirement. A golden parachute is a situation where the agent ceases any effort at some positive stopping time and receives a payment afterward, possibly under the form of a lump-sum payment or of a continuous stream of payments. We show that a golden parachute only exists in certain specific circumstances. This is in contrast with the results claimed by Sannikov, where the only requirement is a positive agent’s marginal cost of effort at zero. In the general case, we prove that an agent with positive reservation utility is either never retired by the principal or retired above some given threshold (as in Sannikov’s solution). We show that different discount factors induce a facelifted utility function, which allows us to reduce the analysis to a setting similar to the equal-discount rates one. Finally, we also confirm that an agent with small reservation utility does have an informational rent, meaning that the principal optimally offers him a contract with strictly higher utility than his participation value.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"33 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning and Balancing Unknown Loads in Large-Scale Systems 学习和平衡大规模系统中的未知负载

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-03 DOI: 10.1287/moor.2021.0212

Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden

{"title":"Learning and Balancing Unknown Loads in Large-Scale Systems","authors":"Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden","doi":"10.1287/moor.2021.0212","DOIUrl":"https://doi.org/10.1287/moor.2021.0212","url":null,"abstract":"Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogeneous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools, while in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time when the normalized offered load per server pool is suitably bounded and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows us to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows us to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.Funding: The work in this paper was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek [Gravitation Grant NETWORKS-024.002.003 and Vici Grant 202.068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"18 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating a Function and Its Derivatives Under a Smoothness Condition 在平滑条件下估算函数及其导数

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-05-02 DOI: 10.1287/moor.2020.0161

Eunji Lim

引用次数: 0

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-04-29 DOI: 10.1287/moor.2022.0357

Ofelia Bonesini, Luciano Campi, Markus Fischer

引用次数: 0

Convexification of Bilinear Terms over Network Polytopes 网络多边形上双线性项的凸化

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-04-22 DOI: 10.1287/moor.2023.0001

Erfan Khademnia, Danial Davarnia

{"title":"Convexification of Bilinear Terms over Network Polytopes","authors":"Erfan Khademnia, Danial Davarnia","doi":"10.1287/moor.2023.0001","DOIUrl":"https://doi.org/10.1287/moor.2023.0001","url":null,"abstract":"It is well-known that the McCormick relaxation for the bilinear constraint z = xy gives the convex hull over the box domains for x and y. In network applications where the domain of bilinear variables is described by a network polytope, the McCormick relaxation, also referred to as linearization, fails to provide the convex hull and often leads to poor dual bounds. We study the convex hull of the set containing bilinear constraints [Formula: see text] where xi represents the arc-flow variable in a network polytope, and yj is in a simplex. For the case where the simplex contains a single y variable, we introduce a systematic procedure to obtain the convex hull of the above set in the original space of variables, and show that all facet-defining inequalities of the convex hull can be obtained explicitly through identifying a special tree structure in the underlying network. For the generalization where the simplex contains multiple y variables, we design a constructive procedure to obtain an important class of facet-defining inequalities for the convex hull of the underlying bilinear set that is characterized by a special forest structure in the underlying network. Computational experiments conducted on different applications show the effectiveness of the proposed methods in improving the dual bounds obtained from alternative techniques.Funding: This work was supported by Air Force Office of Scientific Research [Grant FA9550-23-1-0183]; National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 2338641].Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0001 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"10 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation 线性随机逼近的 Polyak-Ruppert 平均迭代的有限时间高概率边界

IF 1.7 3区数学

Mathematics of Operations Research Pub Date : 2024-04-16 DOI: 10.1287/moor.2022.0179

Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov

{"title":"Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation","authors":"Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov","doi":"10.1287/moor.2022.0179","DOIUrl":"https://doi.org/10.1287/moor.2022.0179","url":null,"abstract":"This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased observations [Formula: see text]. We consider here the case where [Formula: see text] is an a sequence of independent and identically distributed random variables sequence or a uniformly geometrically ergodic Markov chain. We derive pth moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak–Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time [Formula: see text] of the underlying chain and the norm of the noise variables. We emphasize that our result requires the LSA step size to scale only with logarithm of the problem dimension d.Funding: The work of A. Durmus and E. Moulines was partly supported by [Grant ANR-19-CHIA-0002]. This project received funding from the European Research Council [ERC-SyG OCEAN Grant 101071601]. The research of A. Naumov and S. Samsonov was prepared within the framework of the HSE University Basic Research Program.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"185 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0