Intelligent decision-making for a “Three-Variable” frequency-hopping pattern based on OC-CDRL

IF 2 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Physical Communication Pub Date : 2024-07-05 DOI:10.1016/j.phycom.2024.102434

Ziyu Meng , Shaogang Dai , Zhijin Zhao , Xueyi Ye , Shilian Zheng , Caiyi Lou , Xiaoniu Yang

{"title":"Intelligent decision-making for a “Three-Variable” frequency-hopping pattern based on OC-CDRL","authors":"Ziyu Meng , Shaogang Dai , Zhijin Zhao , Xueyi Ye , Shilian Zheng , Caiyi Lou , Xiaoniu Yang","doi":"10.1016/j.phycom.2024.102434","DOIUrl":null,"url":null,"abstract":"<div><p>The frequency hopping pattern of the existing frequency hopping communication system is not designed according to the electromagnetic interference environment, resulting in blind anti-jamming. Therefore, to address this problem, a “three-variable” frequency-hopping pattern is proposed, where the frequency, hopping rate, and instantaneous bandwidth of the frequency-hopping signal vary randomly based on the background electromagnetic interference. The decision-making problem of the “three-variable” frequency-hopping pattern is modeled as a Markov decision process (MDP) by constructing the state-action-reward tuple. The designed frequency varies continuously within a small frequency band selected from a pseudo-random sequence to alleviate the problem of dimension explosion in decision-making. At the same time, discrete values for the hopping rate and instantaneous bandwidth are designed. To solve this MDP problem efficiently, a combined deep reinforcement learning algorithm (OC-CDRL) based on optimistic exploration and conservative estimation is proposed, which combines the features of TD3 and D3QN algorithms and designs the corresponding states, actions, and rewards to deal with continuous and discrete action spaces, respectively. To address the problem that the D3QN algorithm tends to fall into local optimal solutions, an optimistic exploration strategy (OES) for action selection is proposed to improve the degree of exploration. Moreover, the loss function is improved by conservatively estimating state–action pairs outside the experience replay buffer, reducing the overestimation of the optimistic action-value function and increasing the stability and convergence of the algorithm. Comparative simulation results of the algorithms in different electromagnetic interference environments show that the OC-CDRL algorithm effectively avoids most regions with higher interference and has better adaptability and anti-jamming capability.</p></div>","PeriodicalId":48707,"journal":{"name":"Physical Communication","volume":"66 ","pages":"Article 102434"},"PeriodicalIF":2.0000,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Physical Communication","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874490724001526","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The frequency hopping pattern of the existing frequency hopping communication system is not designed according to the electromagnetic interference environment, resulting in blind anti-jamming. Therefore, to address this problem, a “three-variable” frequency-hopping pattern is proposed, where the frequency, hopping rate, and instantaneous bandwidth of the frequency-hopping signal vary randomly based on the background electromagnetic interference. The decision-making problem of the “three-variable” frequency-hopping pattern is modeled as a Markov decision process (MDP) by constructing the state-action-reward tuple. The designed frequency varies continuously within a small frequency band selected from a pseudo-random sequence to alleviate the problem of dimension explosion in decision-making. At the same time, discrete values for the hopping rate and instantaneous bandwidth are designed. To solve this MDP problem efficiently, a combined deep reinforcement learning algorithm (OC-CDRL) based on optimistic exploration and conservative estimation is proposed, which combines the features of TD3 and D3QN algorithms and designs the corresponding states, actions, and rewards to deal with continuous and discrete action spaces, respectively. To address the problem that the D3QN algorithm tends to fall into local optimal solutions, an optimistic exploration strategy (OES) for action selection is proposed to improve the degree of exploration. Moreover, the loss function is improved by conservatively estimating state–action pairs outside the experience replay buffer, reducing the overestimation of the optimistic action-value function and increasing the stability and convergence of the algorithm. Comparative simulation results of the algorithms in different electromagnetic interference environments show that the OC-CDRL algorithm effectively avoids most regions with higher interference and has better adaptability and anti-jamming capability.

查看原文本刊更多论文

基于 OC-CDRL 的 "三变量 "跳频模式智能决策

现有跳频通信系统的跳频模式没有根据电磁干扰环境进行设计，造成抗干扰盲区。因此，针对这一问题，提出了一种 "三变量 "跳频模式，即跳频信号的频率、跳频率和瞬时带宽根据背景电磁干扰随机变化。通过构建状态-行动-回报元组，将 "三变量 "跳频模式的决策问题建模为马尔可夫决策过程（MDP）。设计的频率在一个从伪随机序列中选取的小频带内连续变化，以缓解决策中的维度爆炸问题。同时，还设计了跳跃率和瞬时带宽的离散值。为了高效解决该 MDP 问题，本文提出了一种基于乐观探索和保守估计的组合深度强化学习算法（OC-CDRL），该算法结合了 TD3 算法和 D3QN 算法的特点，设计了相应的状态、行动和奖励，分别处理连续和离散的行动空间。针对 D3QN 算法容易陷入局部最优解的问题，提出了一种用于行动选择的乐观探索策略（OES），以提高探索程度。此外，通过对经验重放缓冲区外的状态-行动对进行保守估计来改进损失函数，从而降低了乐观行动值函数的高估，提高了算法的稳定性和收敛性。算法在不同电磁干扰环境下的仿真比较结果表明，OC-CDRL 算法能有效避开大部分干扰较强的区域，具有更好的适应性和抗干扰能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Physical Communication ENGINEERING, ELECTRICAL & ELECTRONICTELECO-TELECOMMUNICATIONS

CiteScore

5.00

自引率

9.10%

发文量

212

审稿时长

55 days

期刊介绍： PHYCOM: Physical Communication is an international and archival journal providing complete coverage of all topics of interest to those involved in all aspects of physical layer communications. Theoretical research contributions presenting new techniques, concepts or analyses, applied contributions reporting on experiences and experiments, and tutorials are published. Topics of interest include but are not limited to: Physical layer issues of Wireless Local Area Networks, WiMAX, Wireless Mesh Networks, Sensor and Ad Hoc Networks, PCS Systems; Radio access protocols and algorithms for the physical layer; Spread Spectrum Communications; Channel Modeling; Detection and Estimation; Modulation and Coding; Multiplexing and Carrier Techniques; Broadband Wireless Communications; Wireless Personal Communications; Multi-user Detection; Signal Separation and Interference rejection: Multimedia Communications over Wireless; DSP Applications to Wireless Systems; Experimental and Prototype Results; Multiple Access Techniques; Space-time Processing; Synchronization Techniques; Error Control Techniques; Cryptography; Software Radios; Tracking; Resource Allocation and Inference Management; Multi-rate and Multi-carrier Communications; Cross layer Design and Optimization; Propagation and Channel Characterization; OFDM Systems; MIMO Systems; Ultra-Wideband Communications; Cognitive Radio System Architectures; Platforms and Hardware Implementations for the Support of Cognitive, Radio Systems; Cognitive Radio Resource Management and Dynamic Spectrum Sharing.