{"title":"Robustness of Markov perfect equilibrium to model approximations in general-sum dynamic games","authors":"Jayakumar Subramanian, Amit Sinha, Aditya Mahajan","doi":"10.1109/ICC54714.2021.9703156","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703156","url":null,"abstract":"Dynamic games (also called stochastic games or Markov games) are an important class of games for modeling multi-agent interactions. In many situations, the dynamics and reward functions of the game are learnt from past data and are therefore approximate. In this paper, we study the robustness of Markov perfect equilibrium to approximations in reward and transition functions. Using approximation results from Markov decision processes, we show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game. We provide explicit bounds on the approximation error in terms of three quantities: (i) the error in approximating the reward functions, (ii) the error in approximating the transition function, and (iii) a property of the value function of the MPE of the approximate game. The second and third quantities depend on the choice of metric on probability spaces. We also present coarser upper bounds which do not depend on the value function but only depend on the properties of the reward and transition functions of the approximate game. We illustrate the results via a numerical example.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127861974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Robust Nonlinear Adaptive Control for Control of Nuclear Reactor","authors":"P. S. Reddy, S. Shimjith, A. Tiwari, S. Kar","doi":"10.1109/ICC54714.2021.9703154","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703154","url":null,"abstract":"This paper presents a nonlinear dynamic inversion based Model Reference Adaptive Control (MRAC) for nuclear reactor power control. A point kinetics model of a Pressurized Water Reactor with parameter uncertainties in thermal feedback coefficients and bounded reactivity disturbance is considered. The parameter uncertainties and disturbances are estimated based on certainty equivalence principle. Stability and boundedness of closed loop signals is verified through Lyapunov theory. Robustness of MRAC is improved with projection based modification. Finally, the efficacy of proposed adaptive controller is verified through nonlinear simulation studies.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"IM-35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126636842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generalized algorithm and framework for online 3-dimensional bin packing in an automated sorting center","authors":"Ankush Ojha, Marichi Agarwal, Aniruddha Singhal, Chayan Sarkar, Supratim Ghosh, Rajesh Sinha","doi":"10.1109/ICC54714.2021.9703142","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703142","url":null,"abstract":"Online 3-dimensional bin packing problem (O3D-BPP) is getting renewed prominence due to the industrial automation brought by Industry 4.0. However, due to limited attention in the past and its challenging nature, a good approximate algorithm is in scarcity as compared to 1D or 2D problems. This paper considers real-time O3D-BPP of cuboidal boxes with partial information (look-ahead) in an automated robotic sorting center. We present two rolling-horizon mixed-integer linear programming (MILP) cum-heuristic based algorithms: MPack (for bench-marking) and MPackLite (for real-time deployment). Additionally, we present a framework OPack that adapts and improves the performance of BP heuristics by utilizing information in an online setting with a look-ahead. We then perform a comparative analysis of BP heuristics (with and without OPack), MPack, and MPackLite on synthetic and industry provided data with increasing look-ahead. MPackLite and the baseline heuristics perform within bounds of robot operations and thus, can be used in real-time.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Dynamic Programming Formulation for the Nonlinear Filter","authors":"Jin W. Kim, P. Mehta","doi":"10.1109/ICC54714.2021.9703115","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703115","url":null,"abstract":"This paper build on our recent work where we presented a dual stochastic optimal control formulation of the nonlinear filtering problem [1]. The constraint for the dual problem is a backward stochastic differential equations (BSDE). The solution is obtained via an application of the maximum principle (MP). In the present paper, a dynamic programming (DP) principle is presented for a special class of BSDE-constrained stochastic optimal control problems. The principle is applied to derive the solution of the nonlinear filtering problem.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114068620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning","authors":"Gugan Thoppe, Bhumesh Kumar","doi":"10.1109/ICC54714.2021.9702912","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9702912","url":null,"abstract":"In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"398 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116524664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cooperative Target Capture Using Predefined-Time Consensus","authors":"Abhinav Sinha, S. R. Kumar","doi":"10.1109/ICC54714.2021.9703170","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703170","url":null,"abstract":"This paper presents predefined-time consensus-based cooperative guidance laws for a swarm of interceptors to simultaneously capture a non-maneuvering target. Unlike leader-follower cooperative guidance techniques, we design laws for a swarm of interceptors that has no leader and each interceptor executes its own distributed cooperative guidance command. This obviates the residency of the mission over a single interceptor. First, we present the cooperative guidance command against a stationary target, and extend the proposed design using two different approaches for simultaneous interception of a target moving with constant speed. Rigorously, we show that the proposed cooperative guidance laws guarantee consensus in the interceptors’ time-to-go values within a predefined-time. The proposed design allows a feasible time of consensus in time-to-go to be set arbitrarily at will during the design regardless of the interceptors’ initial time-to-go values, thereby ensuring a simultaneous interception in various engagement scenarios. We also demonstrate the efficacy of the proposed design via simulations.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130128643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zeroth-order randomized block methods for constrained minimization of expectation-valued Lipschitz continuous functions","authors":"U. Shanbhag, Farzad Yousefian","doi":"10.1109/ICC54714.2021.9703135","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703135","url":null,"abstract":"We consider the minimization of an $L_{0}$-Lipschitz continuous and expectation-valued function, denoted by $f$ and defined as $f(mathrm{x}) {buildrel triangleover=}mathbb{E}[tilde{f}(mathrm{x}, omega)]$, over a Cartesian product of closed and convex sets with a view towards obtaining both asymptotics as well as rate and complexity guarantees for computing an approximate stationary point (in a Clarke sense). We adopt a smoothing-based approach reliant on minimizing $f_{eta}$ where $f(mathrm{x}) {buildrel triangleover=}mathbb{E}_{u}[f(mathrm{x}+eta u)],u$ is a random variable defined on a unit sphere, and $eta > 0$. In fact, it is observed that a stationary point of the $eta$-smoothed problem is a $2eta$-stationary point for the original problem in the Clarke sense. In such a setting, we derive a suitable residual function that provides a metric for stationarity for the smoothed problem. By leveraging a zeroth-order framework reliant on utilizing sampled function evaluations implemented in a block-structured regime, we make two sets of contributions for the sequence generated by the proposed scheme. (i) The residual function of the smoothed problem tends to zero almost surely along the generated sequence; (ii) To compute an $mathrm{x}$ that ensures that the expected norm of the residual of the $eta$-smoothed problem is within $epsilon$ requires no greater than $mathcal{O}(frac{1}{etaepsilon^{2}})$ projection steps and $mathcal{O}left(frac{1}{eta^{2}epsilon^{4}}right)$ function evaluations. These statements appear to be novel with few related results available to contend with general nonsmooth, nonconvex, and stochastic regimes via zeroth-order approaches.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121645879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning event-driven switched linear systems","authors":"A. Kundu, P. Prabhakar","doi":"10.1109/ICC54714.2021.9703130","DOIUrl":"https://doi.org/10.1109/ICC54714.2021.9703130","url":null,"abstract":"We propose an automata theoretic learning algorithm for the identification of black-box switched linear systems whose switching logics are event-driven. A switched system is expressed by a deterministic finite automaton (FA) whose node labels are the subsystem matrices. With information about the dimensions of the matrices and the set of events, and with access to two oracles, that can simulate the system on a given input, and provide counter-examples when given an incorrect hypothesis automaton, we provide an algorithm that outputs the unknown FA. Our algorithm first uses the oracle to obtain the node labels of the system run on a given input sequence of events, and then extends Angluin's $L^{ast}$ -algorithm to determine the FA that accepts the language of the given FA. We demonstrate our learning algorithm on a numerical example.","PeriodicalId":382373,"journal":{"name":"2021 Seventh Indian Control Conference (ICC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131198901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}