{"title":"Data Efficient Learning of Robust Control Policies","authors":"Susmit Jha, P. Lincoln","doi":"10.1109/ALLERTON.2018.8636072","DOIUrl":"https://doi.org/10.1109/ALLERTON.2018.8636072","url":null,"abstract":"This paper investigates data-efficient methods for learning robust control policies. Reinforcement learning has emerged as an effective approach to learn control policies by interacting directly with the plant, but it requires a significant number of example trajectories to converge to the optimal policy. Combining model-free reinforcement learning with model-based control methods achieves better data-efficiency via simultaneous system identification and controller synthesis. We study a novel approach that exploits the existence of approximate physics models to accelerate the learning of control policies. The proposed approach consists of iterating through three key steps: evaluating a selected policy on the real-world plant and recording trajectories, building a Gaussian process model to predict the reality-gap of a parametric physics model in the neighborhood of the selected policy, and synthesizing a new policy using reinforcement learning on the refined physics model that most likely approximates the real plant. The approach converges to an optimal policy as well as an approximate physics model. The real world experiments are limited to evaluating only promising candidate policies, and the use of Gaussian processes minimizes the number of required real world trajectories. We demonstrate the effectiveness of our techniques on a set of simulation case-studies using OpenAI gym environments.","PeriodicalId":175228,"journal":{"name":"Allerton Conference on Communication, Control, and Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114068652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Potential conditional mutual information: Estimators and properties","authors":"Arman Rahimzamani, Sreeram Kannan","doi":"10.1109/ALLERTON.2017.8262877","DOIUrl":"https://doi.org/10.1109/ALLERTON.2017.8262877","url":null,"abstract":"The conditional mutual information I(X;Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution p_{Y|X,Z} rather than of the joint distribution p_{X,Y,Z}. We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution p_{Y|X,Z} q_{X,Z}, where q_{X,Z} is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference.","PeriodicalId":175228,"journal":{"name":"Allerton Conference on Communication, Control, and Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129839184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the degrees of freedom of SISO interference and X channels with delayed CSIT","authors":"Mohammad Javad Abdoli, A. Ghasemi, A. Khandani","doi":"10.1109/Allerton.2011.6120226","DOIUrl":"https://doi.org/10.1109/Allerton.2011.6120226","url":null,"abstract":"The SISO (single-input single-output) AWGN interference and X channels in i.i.d. fading environment are considered where the transmitters have the past channel state information (CSI) through noiseless feedback links. New transmission schemes are proposed for these channels that achieve degrees of freedom (DoF) values greater than one (except for two-user interference channel). The achieved DoFs are strictly increasing with the number of users and asymptotically approach limiting values of ≈ 1.2663 and ≈ 1.4427 for interference and X channels, respectively. The achieved DoFs are greater than the best previously reported DoFs for these channels with delayed CSI at transmitters.","PeriodicalId":175228,"journal":{"name":"Allerton Conference on Communication, Control, and Computing","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121318263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}