基于积分值函数的动力系统无模型分布强化学习状态估计

IEEE open journal of control systems Pub Date : 2023-02-27 DOI:10.1109/OJCSYS.2023.3250089

Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner

{"title":"基于积分值函数的动力系统无模型分布强化学习状态估计","authors":"Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner","doi":"10.1109/OJCSYS.2023.3250089","DOIUrl":null,"url":null,"abstract":"One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.","PeriodicalId":73299,"journal":{"name":"IEEE open journal of control systems","volume":"2 ","pages":"70-78"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/9552933/9973428/10054475.pdf","citationCount":"1","resultStr":"{\"title\":\"Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions\",\"authors\":\"Babak Salamat;Gerhard Elsbacher;Andrea M. Tonello;Lenz Belzner\",\"doi\":\"10.1109/OJCSYS.2023.3250089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.\",\"PeriodicalId\":73299,\"journal\":{\"name\":\"IEEE open journal of control systems\",\"volume\":\"2 \",\"pages\":\"70-78\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/9552933/9973428/10054475.pdf\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE open journal of control systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10054475/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of control systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10054475/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

传感器网络系统中具有挑战性的问题之一是估计和跟踪具有未知动力学的目标点质量的状态。深度学习（DL）的最新改进显示出对将DL技术应用于状态估计问题的新兴趣。然而，过程噪声不存在，这似乎表明点质量目标必须是非机动的，因为过程噪声通常与跟踪机动目标的测量噪声一样重要。在本文中，我们提出了一种在传感器网络中使用积分值函数的连续时间（CT）无模型或建模分布式强化学习估计器（DRLE）。DRLE算法能够从神经值函数中学习最优策略，该函数旨在提供目标点质量的估计。所提出的估计器由两个加权测量和逆协方差矩阵的高通一致性滤波器和网络中每个节点的临界强化学习机制组成。通过对具有强输入耦合的欠驱动垂直起降飞机网络的仿真实验，表明了所提出的DRLE的效率。该实验强调了DRLE的两个优点：i）它不需要知道动态模型，ii）它比依赖状态的Riccati方程（SDRE）基线快一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model-Free Distributed Reinforcement Learning State Estimation of a Dynamical System Using Integral Value Functions

One of the challenging problems in sensor network systems is to estimate and track the state of a target point mass with unknown dynamics. Recent improvements in deep learning (DL) show a renewed interest in applying DL techniques to state estimation problems. However, the process noise is absent which seems to indicate that the point-mass target must be non-maneuvering, as process noise is typically as significant as the measurement noise for tracking maneuvering targets. In this paper, we propose a continuous-time (CT) model-free or model-building distributed reinforcement learning estimator (DRLE) using an integral value function in sensor networks. The DRLE algorithm is capable of learning an optimal policy from a neural value function that aims to provide the estimation of a target point mass. The proposed estimator consists of two high pass consensus filters in terms of weighted measurements and inverse-covariance matrices and a critic reinforcement learning mechanism for each node in the network. The efficiency of the proposed DRLE is shown by a simulation experiment of a network of underactuated vertical takeoff and landing aircraft with strong input coupling. The experiment highlights two advantages of DRLE: i) it does not require the dynamic model to be known, and ii) it is an order of magnitude faster than the state-dependent Riccati equation (SDRE) baseline.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE open journal of control systems

自引率

0.00%

发文量