通过受限深度强化学习实现多目标雷达跟踪的资源分配

IF 7.4 1区计算机科学 Q1 TELECOMMUNICATIONS

IEEE Transactions on Cognitive Communications and Networking Pub Date : 2023-08-14 DOI:10.1109/TCCN.2023.3304634

Ziyang Lu;M. Cenk Gursoy

{"title":"通过受限深度强化学习实现多目标雷达跟踪的资源分配","authors":"Ziyang Lu;M. Cenk Gursoy","doi":"10.1109/TCCN.2023.3304634","DOIUrl":null,"url":null,"abstract":"In this paper, multi-target tracking in a radar system is considered, and adaptive radar resource management is addressed. In particular, time management in tracking multiple maneuvering targets subject to budget constraints is studied with the goal to minimize the total tracking cost of all targets (or equivalently to maximize the tracking accuracies). The constrained optimization of the dwell time allocation to each target is addressed via deep Q-network (DQN) based reinforcement learning. In the proposed constrained deep reinforcement learning (CDRL) algorithm, both the parameters of the DQN and the dual variable are learned simultaneously. The proposed CDRL framework consists of two components, namely online CDRL and offline CDRL. Training a DQN in the deep reinforcement learning algorithm usually requires a large amount of data, which may not be available in a target tracking task due to the scarcity of measurements. We address this challenge by proposing an offline CDRL framework, in which the algorithm evolves in a virtual environment generated based on the current observations and prior knowledge of the environment. Simulation results show that both offline CDRL and online CDRL are critical for effective target tracking and resource utilization. Offline CDRL provides more training data to stabilize the learning process and the online component can sense the change in the environment and make the corresponding adaptation. Furthermore, a hybrid CDRL algorithm that combines offline CDRL and online CDRL is proposed to reduce the computational burden by performing offline CDRL only periodically to stabilize the training process of the online CDRL.","PeriodicalId":13069,"journal":{"name":"IEEE Transactions on Cognitive Communications and Networking","volume":"9 6","pages":"1677-1690"},"PeriodicalIF":7.4000,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Resource Allocation for Multi-Target Radar Tracking via Constrained Deep Reinforcement Learning\",\"authors\":\"Ziyang Lu;M. Cenk Gursoy\",\"doi\":\"10.1109/TCCN.2023.3304634\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, multi-target tracking in a radar system is considered, and adaptive radar resource management is addressed. In particular, time management in tracking multiple maneuvering targets subject to budget constraints is studied with the goal to minimize the total tracking cost of all targets (or equivalently to maximize the tracking accuracies). The constrained optimization of the dwell time allocation to each target is addressed via deep Q-network (DQN) based reinforcement learning. In the proposed constrained deep reinforcement learning (CDRL) algorithm, both the parameters of the DQN and the dual variable are learned simultaneously. The proposed CDRL framework consists of two components, namely online CDRL and offline CDRL. Training a DQN in the deep reinforcement learning algorithm usually requires a large amount of data, which may not be available in a target tracking task due to the scarcity of measurements. We address this challenge by proposing an offline CDRL framework, in which the algorithm evolves in a virtual environment generated based on the current observations and prior knowledge of the environment. Simulation results show that both offline CDRL and online CDRL are critical for effective target tracking and resource utilization. Offline CDRL provides more training data to stabilize the learning process and the online component can sense the change in the environment and make the corresponding adaptation. Furthermore, a hybrid CDRL algorithm that combines offline CDRL and online CDRL is proposed to reduce the computational burden by performing offline CDRL only periodically to stabilize the training process of the online CDRL.\",\"PeriodicalId\":13069,\"journal\":{\"name\":\"IEEE Transactions on Cognitive Communications and Networking\",\"volume\":\"9 6\",\"pages\":\"1677-1690\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2023-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Cognitive Communications and Networking\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10215369/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10215369/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

本文考虑了雷达系统中的多目标跟踪，并解决了自适应雷达资源管理问题。特别是研究了在预算约束下跟踪多个机动目标的时间管理，目标是使所有目标的总跟踪成本最小化（或等同于跟踪精度最大化）。通过基于深度 Q 网络（DQN）的强化学习，对每个目标的停留时间分配进行了受限优化。在所提出的约束深度强化学习（CDRL）算法中，DQN 的参数和对偶变量都是同时学习的。拟议的 CDRL 框架由两个部分组成，即在线 CDRL 和离线 CDRL。在深度强化学习算法中训练 DQN 通常需要大量数据，而在目标跟踪任务中，由于测量数据稀缺，可能无法获得这些数据。为了应对这一挑战，我们提出了离线 CDRL 框架，在该框架中，算法在基于当前观测数据和环境先验知识生成的虚拟环境中演化。仿真结果表明，离线 CDRL 和在线 CDRL 对于有效跟踪目标和资源利用都至关重要。离线 CDRL 提供了更多的训练数据来稳定学习过程，而在线组件则可以感知环境的变化并做出相应的调整。此外，还提出了一种结合离线 CDRL 和在线 CDRL 的混合 CDRL 算法，通过定期执行离线 CDRL 来稳定在线 CDRL 的训练过程，从而减轻计算负担。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Resource Allocation for Multi-Target Radar Tracking via Constrained Deep Reinforcement Learning

In this paper, multi-target tracking in a radar system is considered, and adaptive radar resource management is addressed. In particular, time management in tracking multiple maneuvering targets subject to budget constraints is studied with the goal to minimize the total tracking cost of all targets (or equivalently to maximize the tracking accuracies). The constrained optimization of the dwell time allocation to each target is addressed via deep Q-network (DQN) based reinforcement learning. In the proposed constrained deep reinforcement learning (CDRL) algorithm, both the parameters of the DQN and the dual variable are learned simultaneously. The proposed CDRL framework consists of two components, namely online CDRL and offline CDRL. Training a DQN in the deep reinforcement learning algorithm usually requires a large amount of data, which may not be available in a target tracking task due to the scarcity of measurements. We address this challenge by proposing an offline CDRL framework, in which the algorithm evolves in a virtual environment generated based on the current observations and prior knowledge of the environment. Simulation results show that both offline CDRL and online CDRL are critical for effective target tracking and resource utilization. Offline CDRL provides more training data to stabilize the learning process and the online component can sense the change in the environment and make the corresponding adaptation. Furthermore, a hybrid CDRL algorithm that combines offline CDRL and online CDRL is proposed to reduce the computational burden by performing offline CDRL only periodically to stabilize the training process of the online CDRL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Cognitive Communications and Networking Computer Science-Artificial Intelligence

CiteScore

15.50

自引率

7.00%

发文量

108

期刊介绍： The IEEE Transactions on Cognitive Communications and Networking (TCCN) aims to publish high-quality manuscripts that push the boundaries of cognitive communications and networking research. Cognitive, in this context, refers to the application of perception, learning, reasoning, memory, and adaptive approaches in communication system design. The transactions welcome submissions that explore various aspects of cognitive communications and networks, focusing on innovative and holistic approaches to complex system design. Key topics covered include architecture, protocols, cross-layer design, and cognition cycle design for cognitive networks. Additionally, research on machine learning, artificial intelligence, end-to-end and distributed intelligence, software-defined networking, cognitive radios, spectrum sharing, and security and privacy issues in cognitive networks are of interest. The publication also encourages papers addressing novel services and applications enabled by these cognitive concepts.