Mathematics of Operations Research最新文献

筛选
英文 中文
Corruption-Robust Exploration in Episodic Reinforcement Learning 情节强化学习中的腐败-稳健探索
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-23 DOI: 10.1287/moor.2021.0202
Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
{"title":"Corruption-Robust Exploration in Episodic Reinforcement Learning","authors":"Thodoris Lykouris, Max Simchowitz, Aleksandrs Slivkins, Wen Sun","doi":"10.1287/moor.2021.0202","DOIUrl":"https://doi.org/10.1287/moor.2021.0202","url":null,"abstract":"We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near-optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit-feedback model for episodic reinforcement learning.Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2021.0202 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"61 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme 带观察成本的马尔可夫决策过程:框架与惩罚方案计算
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-23 DOI: 10.1287/moor.2023.0172
Christoph Reisinger, Jonathan Tam
{"title":"Markov Decision Processes with Observation Costs: Framework and Computation with a Penalty Scheme","authors":"Christoph Reisinger, Jonathan Tam","doi":"10.1287/moor.2023.0172","DOIUrl":"https://doi.org/10.1287/moor.2023.0172","url":null,"abstract":"We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimization of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasivariational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies the uniqueness of solutions to our proposed problem. Penalty methods are then utilized to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications that illustrate our framework.Funding: J. Tam is supported by the Engineering and Physical Sciences Research Council [Grant 2269738].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"26 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions 连续预算拍卖中无悔学习的流动福利保证
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-14 DOI: 10.1287/moor.2023.0274
Giannis Fikioris, Éva Tardos
{"title":"Liquid Welfare Guarantees for No-Regret Learning in Sequential Budgeted Auctions","authors":"Giannis Fikioris, Éva Tardos","doi":"10.1287/moor.2023.0274","DOIUrl":"https://doi.org/10.1287/moor.2023.0274","url":null,"abstract":"We study the liquid welfare in sequential first-price auctions with budgeted buyers. We use a behavioral model for the buyers, assuming a learning style guarantee: the utility of each buyer is within a [Formula: see text] factor ([Formula: see text]) of the utility achievable by shading their value with the same factor at each iteration. We show a [Formula: see text] price of anarchy for liquid welfare when valuations are additive. This is in stark contrast to sequential second-price auctions, where the resulting liquid welfare can be arbitrarily smaller than the maximum liquid welfare, even when [Formula: see text]. We prove a lower bound of [Formula: see text] on the liquid welfare loss under the given assumption in first-price auctions. Our liquid welfare results extend when buyers have submodular valuations over the set of items they win across iterations with a slightly worse price of anarchy bound of [Formula: see text] compared with the guarantee for the additive case.Funding: G. Fikioris is supported in part by the Air Force Office of Scientific Research [Grants FA9550-19-1-0183 and FA9550-23-1-0068], the Department of Defense (DoD) through the National Defense Science & Engineering Graduate (NDSEG) Fellowship Program, and the Onassis Foundation [Scholarship ID F ZS 068-1/2022-2023]. É. Tardos is supported in part by the NSF [Grant CCF-1408673] and AFOSR [Grants FA9550-19-1-0183, FA9550-23-1-0410, and FA9550-23-1-0068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"32 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141146378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Theory of Alternating Paths and Blossoms from the Perspective of Minimum Length 从最小长度的角度看交替路径和花朵理论
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-07 DOI: 10.1287/moor.2020.0388
Vijay V. Vazirani
{"title":"A Theory of Alternating Paths and Blossoms from the Perspective of Minimum Length","authors":"Vijay V. Vazirani","doi":"10.1287/moor.2020.0388","DOIUrl":"https://doi.org/10.1287/moor.2020.0388","url":null,"abstract":"The Micali–Vazirani (MV) algorithm for finding a maximum cardinality matching in general graphs, which was published in 1980, remains to this day the most efficient known algorithm for the problem. The current paper gives the first complete and correct proof of this algorithm. The MV algorithm resorts to finding minimum-length augmenting paths. However, such paths fail to satisfy an elementary property, called breadth first search honesty in this paper. In the absence of this property, an exponential time algorithm appears to be called for—just for finding one such path. On the other hand, the MV algorithm accomplishes this and additional tasks in linear time. The saving grace is the various “footholds” offered by the underlying structure, which the algorithm uses in order to perform its key tasks efficiently. The theory expounded in this paper elucidates this rich structure and yields a proof of correctness of the algorithm. It may also be of independent interest as a set of well-knit graph-theoretic facts.Funding: This work was supported in part by the National Science Foundation [Grant CCF-2230414].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"41 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140933967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Is There a Golden Parachute in Sannikov’s Principal–Agent Problem? 桑尼科夫的委托代理问题中有金降落伞吗?
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-06 DOI: 10.1287/moor.2022.0305
Dylan Possamaï, Nizar Touzi
{"title":"Is There a Golden Parachute in Sannikov’s Principal–Agent Problem?","authors":"Dylan Possamaï, Nizar Touzi","doi":"10.1287/moor.2022.0305","DOIUrl":"https://doi.org/10.1287/moor.2022.0305","url":null,"abstract":"This paper provides a complete review of the continuous-time optimal contracting problem introduced by Sannikov in the extended context allowing for possibly different discount rates for both parties. The agent’s problem is to seek for optimal effort given the compensation scheme proposed by the principal over a random horizon. Then, given the optimal agent’s response, the principal determines the best compensation scheme in terms of running payment, retirement, and lump-sum payment at retirement. A golden parachute is a situation where the agent ceases any effort at some positive stopping time and receives a payment afterward, possibly under the form of a lump-sum payment or of a continuous stream of payments. We show that a golden parachute only exists in certain specific circumstances. This is in contrast with the results claimed by Sannikov, where the only requirement is a positive agent’s marginal cost of effort at zero. In the general case, we prove that an agent with positive reservation utility is either never retired by the principal or retired above some given threshold (as in Sannikov’s solution). We show that different discount factors induce a facelifted utility function, which allows us to reduce the analysis to a setting similar to the equal-discount rates one. Finally, we also confirm that an agent with small reservation utility does have an informational rent, meaning that the principal optimally offers him a contract with strictly higher utility than his participation value.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"33 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140887228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning and Balancing Unknown Loads in Large-Scale Systems 学习和平衡大规模系统中的未知负载
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-03 DOI: 10.1287/moor.2021.0212
Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden
{"title":"Learning and Balancing Unknown Loads in Large-Scale Systems","authors":"Diego Goldsztajn, Sem C. Borst, Johan S. H. van Leeuwaarden","doi":"10.1287/moor.2021.0212","DOIUrl":"https://doi.org/10.1287/moor.2021.0212","url":null,"abstract":"Consider a system of identical server pools where tasks with exponentially distributed service times arrive as a time-inhomogeneous Poisson process. An admission threshold is used in an inner control loop to assign incoming tasks to server pools, while in an outer control loop, a learning scheme adjusts this threshold over time to keep it aligned with the unknown offered load of the system. In a many-server regime, we prove that the learning scheme reaches an equilibrium along intervals of time when the normalized offered load per server pool is suitably bounded and that this results in a balanced distribution of the load. Furthermore, we establish a similar result when tasks with Coxian distributed service times arrive at a constant rate and the threshold is adjusted using only the total number of tasks in the system. The novel proof technique developed in this paper, which differs from a traditional fluid limit analysis, allows us to handle rapid variations of the first learning scheme, triggered by excursions of the occupancy process that have vanishing size. Moreover, our approach allows us to characterize the asymptotic behavior of the system with Coxian distributed service times without relying on a fluid limit of a detailed state descriptor.Funding: The work in this paper was supported by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek [Gravitation Grant NETWORKS-024.002.003 and Vici Grant 202.068].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"18 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating a Function and Its Derivatives Under a Smoothness Condition 在平滑条件下估算函数及其导数
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-05-02 DOI: 10.1287/moor.2020.0161
Eunji Lim
{"title":"Estimating a Function and Its Derivatives Under a Smoothness Condition","authors":"Eunji Lim","doi":"10.1287/moor.2020.0161","DOIUrl":"https://doi.org/10.1287/moor.2020.0161","url":null,"abstract":"We consider the problem of estimating an unknown function [Formula: see text] and its partial derivatives from a noisy data set of n observations, where we make no assumptions about [Formula: see text] except that it is smooth in the sense that it has square integrable partial derivatives of order m. A natural candidate for the estimator of [Formula: see text] in such a case is the best fit to the data set that satisfies a certain smoothness condition. This estimator can be seen as a least squares estimator subject to an upper bound on some measure of smoothness. Another useful estimator is the one that minimizes the degree of smoothness subject to an upper bound on the average of squared errors. We prove that these two estimators are computable as solutions to quadratic programs, establish the consistency of these estimators and their partial derivatives, and study the convergence rate as [Formula: see text]. The effectiveness of the estimators is illustrated numerically in a setting where the value of a stock option and its second derivative are estimated as functions of the underlying stock price.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"40 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140842178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correlated Equilibria for Mean Field Games with Progressive Strategies 具有渐进策略的均势博弈的相关均衡点
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-04-29 DOI: 10.1287/moor.2022.0357
Ofelia Bonesini, Luciano Campi, Markus Fischer
{"title":"Correlated Equilibria for Mean Field Games with Progressive Strategies","authors":"Ofelia Bonesini, Luciano Campi, Markus Fischer","doi":"10.1287/moor.2022.0357","DOIUrl":"https://doi.org/10.1287/moor.2022.0357","url":null,"abstract":"In a discrete space and time framework, we study the mean field game limit for a class of symmetric N-player games based on the notion of correlated equilibrium. We give a definition of correlated solution that allows us to construct approximate N-player correlated equilibria that are robust with respect to progressive deviations. We illustrate our definition by way of an example with explicit solutions.Funding: O. Bonesini acknowledges financial support from Engineering and Physical Sciences Research Council [Grant EP/T032146/1]. M. Fischer acknowledges partial support through the University of Padua [Research Project BIRD229791 “Stochastic mean field control and the Schrödinger problem”].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"54 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Convexification of Bilinear Terms over Network Polytopes 网络多边形上双线性项的凸化
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-04-22 DOI: 10.1287/moor.2023.0001
Erfan Khademnia, Danial Davarnia
{"title":"Convexification of Bilinear Terms over Network Polytopes","authors":"Erfan Khademnia, Danial Davarnia","doi":"10.1287/moor.2023.0001","DOIUrl":"https://doi.org/10.1287/moor.2023.0001","url":null,"abstract":"It is well-known that the McCormick relaxation for the bilinear constraint z = xy gives the convex hull over the box domains for x and y. In network applications where the domain of bilinear variables is described by a network polytope, the McCormick relaxation, also referred to as linearization, fails to provide the convex hull and often leads to poor dual bounds. We study the convex hull of the set containing bilinear constraints [Formula: see text] where x<jats:sub>i</jats:sub> represents the arc-flow variable in a network polytope, and y<jats:sub>j</jats:sub> is in a simplex. For the case where the simplex contains a single y variable, we introduce a systematic procedure to obtain the convex hull of the above set in the original space of variables, and show that all facet-defining inequalities of the convex hull can be obtained explicitly through identifying a special tree structure in the underlying network. For the generalization where the simplex contains multiple y variables, we design a constructive procedure to obtain an important class of facet-defining inequalities for the convex hull of the underlying bilinear set that is characterized by a special forest structure in the underlying network. Computational experiments conducted on different applications show the effectiveness of the proposed methods in improving the dual bounds obtained from alternative techniques.Funding: This work was supported by Air Force Office of Scientific Research [Grant FA9550-23-1-0183]; National Science Foundation, Division of Civil, Mechanical and Manufacturing Innovation [Grant 2338641].Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0001 .","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"10 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140798174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation 线性随机逼近的 Polyak-Ruppert 平均迭代的有限时间高概率边界
IF 1.7 3区 数学
Mathematics of Operations Research Pub Date : 2024-04-16 DOI: 10.1287/moor.2022.0179
Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov
{"title":"Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation","authors":"Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov","doi":"10.1287/moor.2022.0179","DOIUrl":"https://doi.org/10.1287/moor.2022.0179","url":null,"abstract":"This paper provides a finite-time analysis of linear stochastic approximation (LSA) algorithms with fixed step size, a core method in statistics and machine learning. LSA is used to compute approximate solutions of a d-dimensional linear system [Formula: see text] for which [Formula: see text] can only be estimated by (asymptotically) unbiased observations [Formula: see text]. We consider here the case where [Formula: see text] is an a sequence of independent and identically distributed random variables sequence or a uniformly geometrically ergodic Markov chain. We derive pth moment and high-probability deviation bounds for the iterates defined by LSA and its Polyak–Ruppert-averaged version. Our finite-time instance-dependent bounds for the averaged LSA iterates are sharp in the sense that the leading term we obtain coincides with the local asymptotic minimax limit. Moreover, the remainder terms of our bounds admit a tight dependence on the mixing time [Formula: see text] of the underlying chain and the norm of the noise variables. We emphasize that our result requires the LSA step size to scale only with logarithm of the problem dimension d.Funding: The work of A. Durmus and E. Moulines was partly supported by [Grant ANR-19-CHIA-0002]. This project received funding from the European Research Council [ERC-SyG OCEAN Grant 101071601]. The research of A. Naumov and S. Samsonov was prepared within the framework of the HSE University Basic Research Program.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"185 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140612440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信