Artif. Intell.Pub Date : 2023-04-19DOI: 10.48550/arXiv.2304.09700
Ziqiao Ao, Jinglai Li
{"title":"Entropy Estimation via Uniformization","authors":"Ziqiao Ao, Jinglai Li","doi":"10.48550/arXiv.2304.09700","DOIUrl":"https://doi.org/10.48550/arXiv.2304.09700","url":null,"abstract":"Entropy estimation is of practical importance in information theory and statistical science. Many existing entropy estimators suffer from fast growing estimation bias with respect to dimensionality, rendering them unsuitable for high-dimensional problems. In this work we propose a transform-based method for high-dimensional entropy estimation, which consists of the following two main ingredients. First by modifying the k-NN based entropy estimator, we propose a new estimator which enjoys small estimation bias for samples that are close to a uniform distribution. Second we design a normalizing flow based mapping that pushes samples toward a uniform distribution, and the relation between the entropy of the original samples and the transformed ones is also derived. As a result the entropy of a given set of samples is estimated by first transforming them toward a uniform distribution and then applying the proposed estimator to the transformed samples. The performance of the proposed method is compared against several existing entropy estimators, with both mathematical examples and real-world applications.","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"23 1","pages":"103954"},"PeriodicalIF":0.0,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84791431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-12-30DOI: 10.48550/arXiv.2301.01219
Franck Djeumou, Christian Ellis, Murat Cubuktepe, Craig T. Lennon, U. Topcu
{"title":"Task-Guided IRL in POMDPs that Scales","authors":"Franck Djeumou, Christian Ellis, Murat Cubuktepe, Craig T. Lennon, U. Topcu","doi":"10.48550/arXiv.2301.01219","DOIUrl":"https://doi.org/10.48550/arXiv.2301.01219","url":null,"abstract":"In inverse reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. However, many existing IRL techniques make the often unrealistic assumption that the agent has access to full information about the environment. We remove this assumption by developing an algorithm for IRL in partially observable Markov decision processes (POMDPs). We address two limitations of existing IRL techniques. First, they require an excessive amount of data due to the information asymmetry between the expert and the learner. Second, most of these IRL techniques require solving the computationally intractable forward problem -- computing an optimal policy given a reward function -- in POMDPs. The developed algorithm reduces the information asymmetry while increasing the data efficiency by incorporating task specifications expressed in temporal logic into IRL. Such specifications may be interpreted as side information available to the learner a priori in addition to the demonstrations. Further, the algorithm avoids a common source of algorithmic complexity by building on causal entropy as the measure of the likelihood of the demonstrations as opposed to entropy. Nevertheless, the resulting problem is nonconvex due to the so-called forward problem. We solve the intrinsic nonconvexity of the forward problem in a scalable manner through a sequential linear programming scheme that guarantees to converge to a locally optimal policy. In a series of examples, including experiments in a high-fidelity Unity simulator, we demonstrate that even with a limited amount of data and POMDPs with tens of thousands of states, our algorithm learns reward functions and policies that satisfy the task while inducing similar behavior to the expert by leveraging the provided side information.","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"46 1","pages":"103856"},"PeriodicalIF":0.0,"publicationDate":"2022-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80005271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-09-01DOI: 10.1016/j.artint.2022.103791
Jiarui Gan, E. Elkind, Sarit Kraus, M. Wooldridge
{"title":"Defense coordination in security games: Equilibrium analysis and mechanism design","authors":"Jiarui Gan, E. Elkind, Sarit Kraus, M. Wooldridge","doi":"10.1016/j.artint.2022.103791","DOIUrl":"https://doi.org/10.1016/j.artint.2022.103791","url":null,"abstract":"","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"48 1","pages":"103791"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81771618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-09-01DOI: 10.1016/j.artint.2022.103792
Oskar Skibski, Takamasa Suzuki, Tomasz Grabowski, Y. Sakurai, Tomasz P. Michalak, M. Yokoo
{"title":"Measuring power in coalitional games with friends, enemies and allies","authors":"Oskar Skibski, Takamasa Suzuki, Tomasz Grabowski, Y. Sakurai, Tomasz P. Michalak, M. Yokoo","doi":"10.1016/j.artint.2022.103792","DOIUrl":"https://doi.org/10.1016/j.artint.2022.103792","url":null,"abstract":"","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"42 1","pages":"103792"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81025926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-09-01DOI: 10.1016/j.artint.2022.103793
Davide Grossi, W. van der Hoek, Louwe B. Kuijer
{"title":"Reasoning about general preference relations","authors":"Davide Grossi, W. van der Hoek, Louwe B. Kuijer","doi":"10.1016/j.artint.2022.103793","DOIUrl":"https://doi.org/10.1016/j.artint.2022.103793","url":null,"abstract":"","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"31 1","pages":"103793"},"PeriodicalIF":0.0,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84190819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-08-17DOI: 10.48550/arXiv.2208.08345
Z. Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan G. Richens, Matt MacDermott, Tom Everitt
{"title":"Discovering Agents","authors":"Z. Kenton, Ramana Kumar, Sebastian Farquhar, Jonathan G. Richens, Matt MacDermott, Tom Everitt","doi":"10.48550/arXiv.2208.08345","DOIUrl":"https://doi.org/10.48550/arXiv.2208.08345","url":null,"abstract":"Causal models of agents have been used to analyse the safety aspects of machine learning systems. But identifying agents is non-trivial -- often the causal model is just assumed by the modeler without much justification -- and modelling failures can lead to mistakes in the safety analysis. This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way. From this we derive the first causal discovery algorithm for discovering agents from empirical data, and give algorithms for translating between causal models and game-theoretic influence diagrams. We demonstrate our approach by resolving some previous confusions caused by incorrect causal modelling of agents.","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"63 1","pages":"103963"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77130229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-08-01DOI: 10.1016/j.artint.2022.103775
A. Zhitnikov, V. Indelman
{"title":"Simplified Risk-aware Decision Making with Belief-dependent Rewards in Partially Observable Domains","authors":"A. Zhitnikov, V. Indelman","doi":"10.1016/j.artint.2022.103775","DOIUrl":"https://doi.org/10.1016/j.artint.2022.103775","url":null,"abstract":"","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"100 1","pages":"103775"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73670195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-08-01DOI: 10.1016/j.artint.2022.103771
L. Roveda, Andrea Testa, Asad Ali Shahid, F. Braghin, D. Piga
{"title":"Q-Learning-based model predictive variable impedance control for physical human-robot collaboration","authors":"L. Roveda, Andrea Testa, Asad Ali Shahid, F. Braghin, D. Piga","doi":"10.1016/j.artint.2022.103771","DOIUrl":"https://doi.org/10.1016/j.artint.2022.103771","url":null,"abstract":"","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"58 1","pages":"103771"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91345015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artif. Intell.Pub Date : 2022-06-27DOI: 10.48550/arXiv.2206.13319
Mathijs Schuurmans, Alexander Katriniok, Chris Meissen, H. E. Tseng, Panagiotis Patrinos
{"title":"Safe, Learning-Based MPC for Highway Driving under Lane-Change Uncertainty: A Distributionally Robust Approach","authors":"Mathijs Schuurmans, Alexander Katriniok, Chris Meissen, H. E. Tseng, Panagiotis Patrinos","doi":"10.48550/arXiv.2206.13319","DOIUrl":"https://doi.org/10.48550/arXiv.2206.13319","url":null,"abstract":"We present a case study applying learning-based distributionally robust model predictive control to highway motion planning under stochastic uncertainty of the lane change behavior of surrounding road users. The dynamics of road users are modelled using Markov jump systems, in which the switching variable describes the desired lane of the vehicle under consideration and the continuous state describes the pose and velocity of the vehicles. We assume the switching probabilities of the underlying Markov chain to be unknown. As the vehicle is observed and thus, samples from the Markov chain are drawn, the transition probabilities are estimated along with an ambiguity set which accounts for misestimations of these probabilities. Correspondingly, a distributionally robust optimal control problem is formulated over a scenario tree, and solved in receding horizon. As a result, a motion planning procedure is obtained which through observation of the target vehicle gradually becomes less conservative while avoiding overconfidence in estimates obtained from small sample sizes. We present an extensive numerical case study, comparing the effects of several different design aspects on the controller performance and safety.","PeriodicalId":8496,"journal":{"name":"Artif. Intell.","volume":"25 1","pages":"103920"},"PeriodicalIF":0.0,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78878269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}