Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner
{"title":"基于积分值函数的动力系统无模型分布强化学习状态估计","authors":"Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner","doi":"10.1109/OJCSYS.2023.3250089","DOIUrl":null,"url":null,"abstract":"One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"70-78"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10054475.pdf","citationCount":"1","resultStr":"{\"title\":\"Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions\",\"authors\":\"Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner\",\"doi\":\"10.1109/OJCSYS.2023.3250089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"2 \",\"pages\":\"70-78\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/9552933/9973428/10054475.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10054475/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10054475/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions
One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.