{"title":"基于PER和Pareto的DQN二元跳频方向图智能抗干扰决策算法","authors":"Jiasheng Zhu, Zhijin Zhao, Shilian Zheng","doi":"10.4018/ijitwe.297970","DOIUrl":null,"url":null,"abstract":"To improve the anti-jamming performance of frequency hopping system in complex electromagnetic environment, a Deep Q-Network algorithm with priority experience replay (PER) based on Pareto samples (PPER-DQN) is proposed, which makes intelligent decision for bivariate FH pattern. The system model, state-action space and reward function are designed based on the main parameters of the FH pattern. The DQN is used to improve the flexibility of the FH pattern. Based on the definition of Pareto dominance, the PER based on the TD-error and immediate reward is proposed. To ensure the diversity of the training set, it is formed by Pareto sample set and several random samples. When selecting Pareto sample, the confidence coefficient is introduced to modify its priority. It guarantees the learning value of the training set and improves the learning efficiency of DQN. The simulation results show that the efficiency, convergence speed and stability of the algorithm are effectively improved. And the generated bivariate FH pattern has better performance than the conventional FH pattern.","PeriodicalId":222340,"journal":{"name":"Int. J. Inf. Technol. Web Eng.","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Intelligent Anti-Jamming Decision Algorithm of Bivariate Frequency Hopping Pattern Based on DQN With PER and Pareto\",\"authors\":\"Jiasheng Zhu, Zhijin Zhao, Shilian Zheng\",\"doi\":\"10.4018/ijitwe.297970\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To improve the anti-jamming performance of frequency hopping system in complex electromagnetic environment, a Deep Q-Network algorithm with priority experience replay (PER) based on Pareto samples (PPER-DQN) is proposed, which makes intelligent decision for bivariate FH pattern. The system model, state-action space and reward function are designed based on the main parameters of the FH pattern. The DQN is used to improve the flexibility of the FH pattern. Based on the definition of Pareto dominance, the PER based on the TD-error and immediate reward is proposed. To ensure the diversity of the training set, it is formed by Pareto sample set and several random samples. When selecting Pareto sample, the confidence coefficient is introduced to modify its priority. It guarantees the learning value of the training set and improves the learning efficiency of DQN. The simulation results show that the efficiency, convergence speed and stability of the algorithm are effectively improved. And the generated bivariate FH pattern has better performance than the conventional FH pattern.\",\"PeriodicalId\":222340,\"journal\":{\"name\":\"Int. J. Inf. Technol. Web Eng.\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Inf. Technol. Web Eng.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/ijitwe.297970\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Inf. Technol. Web Eng.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijitwe.297970","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Intelligent Anti-Jamming Decision Algorithm of Bivariate Frequency Hopping Pattern Based on DQN With PER and Pareto
To improve the anti-jamming performance of frequency hopping system in complex electromagnetic environment, a Deep Q-Network algorithm with priority experience replay (PER) based on Pareto samples (PPER-DQN) is proposed, which makes intelligent decision for bivariate FH pattern. The system model, state-action space and reward function are designed based on the main parameters of the FH pattern. The DQN is used to improve the flexibility of the FH pattern. Based on the definition of Pareto dominance, the PER based on the TD-error and immediate reward is proposed. To ensure the diversity of the training set, it is formed by Pareto sample set and several random samples. When selecting Pareto sample, the confidence coefficient is introduced to modify its priority. It guarantees the learning value of the training set and improves the learning efficiency of DQN. The simulation results show that the efficiency, convergence speed and stability of the algorithm are effectively improved. And the generated bivariate FH pattern has better performance than the conventional FH pattern.