{"title":"Steiner Cut Dominants","authors":"Michele Conforti, Volker Kaibel","doi":"10.1287/moor.2022.0280","DOIUrl":"https://doi.org/10.1287/moor.2022.0280","url":null,"abstract":"For a subset T of nodes of an undirected graph G, a T-Steiner cut is a cut [Formula: see text] with [Formula: see text] and [Formula: see text]. The T-Steiner cut dominant of G is the dominant [Formula: see text] of the convex hull of the incidence vectors of the T-Steiner cuts of G. For [Formula: see text], this is the well-understood s-t-cut dominant. Choosing T as the set of all nodes of G, we obtain the cut dominant for which an outer description in the space of the original variables is still not known. We prove that for each integer τ, there is a finite set of inequalities such that for every pair (G, T) with [Formula: see text], the nontrivial facet-defining inequalities of [Formula: see text] are the inequalities that can be obtained via iterated applications of two simple operations, starting from that set. In particular, the absolute values of the coefficients and of the right-hand sides in a description of [Formula: see text] by integral inequalities can be bounded from above by a function of [Formula: see text]. For all [Formula: see text], we provide descriptions of [Formula: see text] by facet-defining inequalities, extending the known descriptions of s-t-cut dominants.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"6 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140582186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Rates for the Regret of Offline Reinforcement Learning","authors":"Yichun Hu, Nathan Kallus, Masatoshi Uehara","doi":"10.1287/moor.2021.0167","DOIUrl":"https://doi.org/10.1287/moor.2021.0167","url":null,"abstract":"We study the regret of offline reinforcement learning in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of common approaches, such as fitted Q-iteration (FQI), suggest root-n convergence for regret, empirical behavior exhibits much faster convergence. In this paper, we present a finer regret analysis that exactly characterizes this phenomenon by providing fast rates for the regret convergence. First, we show that given any estimate for the optimal quality function, the regret of the policy it defines converges at a rate given by the exponentiation of the estimate’s pointwise convergence rate, thus speeding up the rate. The level of exponentiation depends on the level of noise in the decision-making problem, rather than the estimation problem. We establish such noise levels for linear and tabular MDPs as examples. Second, we provide new analyses of FQI and Bellman residual minimization to establish the correct pointwise convergence guarantees. As specific cases, our results imply one-over-n rates in linear cases and exponential-in-n rates in tabular cases. We extend our findings to general function approximation by extending our results to regret guarantees based on L<jats:sub>p</jats:sub>-convergence rates for estimating the optimal quality function rather than pointwise rates, where L<jats:sub>2</jats:sub> guarantees for nonparametric estimation can be ensured under mild conditions.Funding: This work was supported by the Division of Information and Intelligent Systems, National Science Foundation [Grant 1846210].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"47 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140297959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semidefinite Approximations for Bicliques and Bi-Independent Pairs","authors":"Monique Laurent, Sven Polak, Luis Felipe Vargas","doi":"10.1287/moor.2023.0046","DOIUrl":"https://doi.org/10.1287/moor.2023.0046","url":null,"abstract":"We investigate some graph parameters dealing with bi-independent pairs (A, B) in a bipartite graph [Formula: see text], that is, pairs (A, B) where [Formula: see text], and [Formula: see text] are independent. These parameters also allow us to study bicliques in general graphs. When maximizing the cardinality [Formula: see text], one finds the stability number [Formula: see text], well-known to be polynomial-time computable. When maximizing the product [Formula: see text], one finds the parameter g(G), shown to be NP-hard by Peeters in 2003, and when maximizing the ratio [Formula: see text], one finds h(G), introduced by Vallentin in 2020 for bounding product-free sets in finite groups. We show that h(G) is an NP-hard parameter and, as a crucial ingredient, that it is NP-complete to decide whether a bipartite graph G has a balanced maximum independent set. These hardness results motivate introducing semidefinite programming (SDP) bounds for g(G), h(G), and [Formula: see text] (the maximum cardinality of a balanced independent set). We show that these bounds can be seen as natural variations of the Lovász ϑ-number, a well-known semidefinite bound on [Formula: see text]. In addition, we formulate closed-form eigenvalue bounds, and we show relationships among them as well as with earlier spectral parameters by Hoffman and Haemers in 2001 and Vallentin in 2020.Funding: This work was supported by H2020 Marie Skłodowska-Curie Actions [Grant 813211 (POEMA)].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"23 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Marginal Values of a Stochastic Game","authors":"Luc Attia, Miquel Oliu-Barton, Raimundo Saona","doi":"10.1287/moor.2023.0297","DOIUrl":"https://doi.org/10.1287/moor.2023.0297","url":null,"abstract":"Zero-sum stochastic games are parameterized by payoffs, transitions, and possibly a discount rate. In this article, we study how the main solution concepts, the discounted and undiscounted values, vary when these parameters are perturbed. We focus on the marginal values, introduced by Mills in 1956 in the context of matrix games—that is, the directional derivatives of the value along any fixed perturbation. We provide a formula for the marginal values of a discounted stochastic game. Further, under mild assumptions on the perturbation, we provide a formula for their limit as the discount rate vanishes and for the marginal values of an undiscounted stochastic game. We also show, via an example, that the two latter differ in general.Funding: This work was supported by Fondation CFM pour la Recherche; the European Research Council [Grant ERC-CoG-863818 (ForM-SMArt)]; and Agence Nationale de la Recherche [Grant ANR-21-CE40-0020].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"50 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Convergence and Stability of Coupled Belief-Strategy Learning Dynamics in Continuous Games","authors":"Manxi Wu, Saurabh Amin, Asuman Ozdaglar","doi":"10.1287/moor.2022.0161","DOIUrl":"https://doi.org/10.1287/moor.2022.0161","url":null,"abstract":"We propose a learning dynamics to model how strategic agents repeatedly play a continuous game while relying on an information platform to learn an unknown payoff-relevant parameter. In each time step, the platform updates a belief estimate of the parameter based on players’ strategies and realized payoffs using Bayes’ rule. Then, players adopt a generic learning rule to adjust their strategies based on the updated belief. We present results on the convergence of beliefs and strategies and the properties of convergent fixed points of the dynamics. We obtain sufficient and necessary conditions for the existence of globally stable fixed points. We also provide sufficient conditions for the local stability of fixed points. These results provide an approach to analyzing the long-term outcomes that arise from the interplay between Bayesian belief learning and strategy learning in games and enable us to characterize conditions under which learning leads to a complete information equilibrium.Funding: Financial support from the Air Force Office of Scientific Research [Project Building Attack Resilience into Complex Networks], the Simons Institute [research fellowship], and a Michael Hammer Fellowship is gratefully acknowledged.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"72 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140147400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant
{"title":"A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP","authors":"Mehrdad Moharrami, Yashaswini Murthy, Arghyadip Roy, R. Srikant","doi":"10.1287/moor.2022.0139","DOIUrl":"https://doi.org/10.1287/moor.2022.0139","url":null,"abstract":"We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.Funding: This work was supported by the Office of Naval Research Global [Grant N0001419-1-2566], the Division of Computer and Network Systems [Grant 21-06801], the Army Research Office [Grant W911NF-19-1-0379], and the Division of Computing and Communication Foundations [Grants 17-04970 and 19-34986].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"7 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140107747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio Bellon, Didier Henrion, Vyacheslav Kungurtsev, Jakub Mareček
{"title":"Parametric Semidefinite Programming: Geometry of the Trajectory of Solutions","authors":"Antonio Bellon, Didier Henrion, Vyacheslav Kungurtsev, Jakub Mareček","doi":"10.1287/moor.2021.0097","DOIUrl":"https://doi.org/10.1287/moor.2021.0097","url":null,"abstract":"In many applications, solutions of convex optimization problems are updated on-line, as functions of time. In this paper, we consider parametric semidefinite programs, which are linear optimization problems in the semidefinite cone whose coefficients (input data) depend on a time parameter. We are interested in the geometry of the solution (output data) trajectory, defined as the set of solutions depending on the parameter. We propose an exhaustive description of the geometry of the solution trajectory. As our main result, we show that only six distinct behaviors can be observed at a neighborhood of a given point along the solution trajectory. Each possible behavior is then illustrated by an example.Funding: This work was supported by OP RDE [Grant CZ.02.1.01/0.0/0.0/16_019/0000765].","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"86 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140070190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the (Im-)Possibility of Representing Probability Distributions as a Difference of I.I.D. Noise Terms","authors":"Christian Ewerhart, Marco Serena","doi":"10.1287/moor.2023.0081","DOIUrl":"https://doi.org/10.1287/moor.2023.0081","url":null,"abstract":"A random variable is difference-form decomposable (DFD) if it may be written as the difference of two i.i.d. random terms. We show that densities of such variables exhibit a remarkable degree of structure. Specifically, a DFD density can be neither approximately uniform, nor quasiconvex, nor strictly concave. On the other hand, a DFD density need, in general, be neither unimodal nor logconcave. Regarding smoothness, we show that a compactly supported DFD density cannot be analytic and will often exhibit a kink even if its components are smooth. The analysis highlights the risks for model consistency resulting from the strategy widely adopted in the economics literature of imposing assumptions directly on a difference of noise terms rather than on its components.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"276 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140070277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Strongly Convergent Homogeneous Approximations to Inhomogeneous Markov Jump Processes and Applications","authors":"Martin Bladt, Oscar Peralta","doi":"10.1287/moor.2022.0153","DOIUrl":"https://doi.org/10.1287/moor.2022.0153","url":null,"abstract":"The study of time-inhomogeneous Markov jump processes is a traditional topic within probability theory that has recently attracted substantial attention in various applications. However, their flexibility also incurs a substantial mathematical burden which is usually circumvented by using well-known generic distributional approximations or simulations. This article provides a novel approximation method that tailors the dynamics of a time-homogeneous Markov jump process to meet those of its time-inhomogeneous counterpart on an increasingly fine Poisson grid. Strong convergence of the processes in terms of the Skorokhod J<jats:sub>1</jats:sub> metric is established, and convergence rates are provided. Under traditional regularity assumptions, distributional convergence is established for unconditional proxies, to the same limit. Special attention is devoted to the case where the target process has one absorbing state and the remaining ones transient, for which the absorption times also converge. Some applications are outlined, such as univariate hazard-rate density estimation, ruin probabilities, and multivariate phase-type density evaluation.Funding: M. Bladt and O. Peralta would like to acknowledge financial support from the Swiss National Science Foundation Project 200021_191984. O. Peralta acknowledges financial support from NSF Award #1653354 and AXA Research Fund Award on “Mitigating risk in the wake of the pandemic”.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"159 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139946213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Risk Sharing with Lambda Value at Risk","authors":"Peng Liu","doi":"10.1287/moor.2023.0246","DOIUrl":"https://doi.org/10.1287/moor.2023.0246","url":null,"abstract":"In this paper, we study the risk-sharing problem among multiple agents using lambda value at risk ([Formula: see text]) as their preferences via the tool of inf-convolution, where [Formula: see text] is an extension of value at risk ([Formula: see text]). We obtain explicit formulas of the inf-convolution of multiple [Formula: see text] with monotone Λ and explicit forms of the corresponding optimal allocations, extending the results of the inf-convolution of [Formula: see text]. It turns out that the inf-convolution of several [Formula: see text] is still a [Formula: see text] under some mild condition. Moreover, we investigate the inf-convolution of one [Formula: see text] and a general monotone risk measure without cash additivity, including [Formula: see text], expected utility, and rank-dependent expected utility as special cases. The expression of the inf-convolution and the explicit forms of the optimal allocation are derived, leading to some partial solution of the risk-sharing problem with multiple [Formula: see text] for general Λ functions. Finally, we discuss the risk-sharing problem with [Formula: see text], another definition of lambda value at risk. We focus on the inf-convolution of [Formula: see text] and a risk measure that is consistent with the second-order stochastic dominance, deriving very different expression of the inf-convolution and the forms of the optimal allocations.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"234 1","pages":""},"PeriodicalIF":1.7,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}