{"title":"确保无人机对车辆通信:好奇心驱动的深度q -学习网络(C-DQN)方法","authors":"Fang Fu, Qi Jiao, F. Yu, Zhicai Zhang, Jianbo Du","doi":"10.1109/ICCWorkshops50388.2021.9473714","DOIUrl":null,"url":null,"abstract":"Unmanned aerial vehicle (UAV) will open up new application fields in smart city-based intelligent transportation systems (ITSs), e.g., traffic management, disaster rescue, police patrol, etc. However, the broadcast and line-of-sight nature of air-to-ground wireless channels give rise to a new challenge to the information security of UAV-to-vehicle (U2V) communications. This paper considers U2V communications subject to multi-eavesdroppers on the ground in urban scenarios. We aim to maximize the secrecy rates in physical layer security perspective while considering both the energy consumption and flight zone limitation, by jointly optimizing the UAV’s trajectory, the transmit power of the UAV, and the jamming power sent by the roadside unit (RSU). This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the wireless channels. A curiosity-driven deep reinforcement learning (DRL) algorithm is subsequently utilized to solve the above MDP, in which the agent is reinforced by an extrinsic reward supplied by the environment and an intrinsic reward defined as the prediction error of the consequence after executing its actions. Extensive simulation results show that compared to the DRL without intrinsic rewards, the proposed scheme can have excellent performance in terms of the average reward, learning efficiency, and generalization to other scenarios.","PeriodicalId":127186,"journal":{"name":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Securing UAV-to-Vehicle Communications: A Curiosity-Driven Deep Q-learning Network (C-DQN) Approach\",\"authors\":\"Fang Fu, Qi Jiao, F. Yu, Zhicai Zhang, Jianbo Du\",\"doi\":\"10.1109/ICCWorkshops50388.2021.9473714\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Unmanned aerial vehicle (UAV) will open up new application fields in smart city-based intelligent transportation systems (ITSs), e.g., traffic management, disaster rescue, police patrol, etc. However, the broadcast and line-of-sight nature of air-to-ground wireless channels give rise to a new challenge to the information security of UAV-to-vehicle (U2V) communications. This paper considers U2V communications subject to multi-eavesdroppers on the ground in urban scenarios. We aim to maximize the secrecy rates in physical layer security perspective while considering both the energy consumption and flight zone limitation, by jointly optimizing the UAV’s trajectory, the transmit power of the UAV, and the jamming power sent by the roadside unit (RSU). This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the wireless channels. A curiosity-driven deep reinforcement learning (DRL) algorithm is subsequently utilized to solve the above MDP, in which the agent is reinforced by an extrinsic reward supplied by the environment and an intrinsic reward defined as the prediction error of the consequence after executing its actions. Extensive simulation results show that compared to the DRL without intrinsic rewards, the proposed scheme can have excellent performance in terms of the average reward, learning efficiency, and generalization to other scenarios.\",\"PeriodicalId\":127186,\"journal\":{\"name\":\"2021 IEEE International Conference on Communications Workshops (ICC Workshops)\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Communications Workshops (ICC Workshops)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCWorkshops50388.2021.9473714\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWorkshops50388.2021.9473714","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Securing UAV-to-Vehicle Communications: A Curiosity-Driven Deep Q-learning Network (C-DQN) Approach
Unmanned aerial vehicle (UAV) will open up new application fields in smart city-based intelligent transportation systems (ITSs), e.g., traffic management, disaster rescue, police patrol, etc. However, the broadcast and line-of-sight nature of air-to-ground wireless channels give rise to a new challenge to the information security of UAV-to-vehicle (U2V) communications. This paper considers U2V communications subject to multi-eavesdroppers on the ground in urban scenarios. We aim to maximize the secrecy rates in physical layer security perspective while considering both the energy consumption and flight zone limitation, by jointly optimizing the UAV’s trajectory, the transmit power of the UAV, and the jamming power sent by the roadside unit (RSU). This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the wireless channels. A curiosity-driven deep reinforcement learning (DRL) algorithm is subsequently utilized to solve the above MDP, in which the agent is reinforced by an extrinsic reward supplied by the environment and an intrinsic reward defined as the prediction error of the consequence after executing its actions. Extensive simulation results show that compared to the DRL without intrinsic rewards, the proposed scheme can have excellent performance in terms of the average reward, learning efficiency, and generalization to other scenarios.