{"title":"HQA: Hybrid Q-learning and AODV multi-path routing algorithm for Flying Ad-hoc Networks","authors":"Chen Sun, Liang Hou, Suqi Yu, Jian Shu","doi":"10.1016/j.vehcom.2025.100947","DOIUrl":null,"url":null,"abstract":"<div><div>Reliable and efficient data transmission between Unmanned Aerial Vehicle (UAV) nodes is critical for the control of UAV swarms and relies heavily on effective routing protocols in Flying Ad-hoc Networks (FANETs). However, Q-learning-based FANET routing protocols, which are gaining widespread attention, face two significant challenges: 1) insufficient stability of Q-learning leads to unreliable route selection in certain scenarios and higher packet loss rates; 2) in void regions with frequent topology changes and vast path exploration spaces, the slow convergence of Q-learning fails to adapt quickly to dynamic environmental changes, thereby reducing the packet delivery rate (PDR). This paper proposes a hybrid Q-learning/AODV (HQA) multi-path routing algorithm that integrates Q-learning and the AODV protocols to address these challenges. HQA includes a Bayesian stability evaluator for adaptive Q-learning/AODV switching and a dual-update reward mechanism that integrates reliable AODV paths into Q-learning training, enabling rapid void recovery and latency-optimized routing. Experimental results demonstrate HQA's superiority over baseline protocols: Compared to AODV, HQA reduces average end-to-end delay by 13.6–23.9% and improves PDR by 5.4–9.1% in non-void and void states, respectively. It outperforms QMR by 2.2–6.3% in PDR while achieving 25.6% and 53.2% higher average PDR than QMR and AODV across network densities. The hybrid design accelerates convergence by 40% versus standalone Q-learning through AODV-assisted rewards, maintaining scalability under dynamic topology changes. These findings indicate that the HQA algorithm can more rapidly adapt to the rapid changes in FANETs and better handle void regions, offering a promising solution for enhancing the performance and reliability of FANETs.</div></div>","PeriodicalId":54346,"journal":{"name":"Vehicular Communications","volume":"55 ","pages":"Article 100947"},"PeriodicalIF":5.8000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vehicular Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214209625000749","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Reliable and efficient data transmission between Unmanned Aerial Vehicle (UAV) nodes is critical for the control of UAV swarms and relies heavily on effective routing protocols in Flying Ad-hoc Networks (FANETs). However, Q-learning-based FANET routing protocols, which are gaining widespread attention, face two significant challenges: 1) insufficient stability of Q-learning leads to unreliable route selection in certain scenarios and higher packet loss rates; 2) in void regions with frequent topology changes and vast path exploration spaces, the slow convergence of Q-learning fails to adapt quickly to dynamic environmental changes, thereby reducing the packet delivery rate (PDR). This paper proposes a hybrid Q-learning/AODV (HQA) multi-path routing algorithm that integrates Q-learning and the AODV protocols to address these challenges. HQA includes a Bayesian stability evaluator for adaptive Q-learning/AODV switching and a dual-update reward mechanism that integrates reliable AODV paths into Q-learning training, enabling rapid void recovery and latency-optimized routing. Experimental results demonstrate HQA's superiority over baseline protocols: Compared to AODV, HQA reduces average end-to-end delay by 13.6–23.9% and improves PDR by 5.4–9.1% in non-void and void states, respectively. It outperforms QMR by 2.2–6.3% in PDR while achieving 25.6% and 53.2% higher average PDR than QMR and AODV across network densities. The hybrid design accelerates convergence by 40% versus standalone Q-learning through AODV-assisted rewards, maintaining scalability under dynamic topology changes. These findings indicate that the HQA algorithm can more rapidly adapt to the rapid changes in FANETs and better handle void regions, offering a promising solution for enhancing the performance and reliability of FANETs.
期刊介绍:
Vehicular communications is a growing area of communications between vehicles and including roadside communication infrastructure. Advances in wireless communications are making possible sharing of information through real time communications between vehicles and infrastructure. This has led to applications to increase safety of vehicles and communication between passengers and the Internet. Standardization efforts on vehicular communication are also underway to make vehicular transportation safer, greener and easier.
The aim of the journal is to publish high quality peer–reviewed papers in the area of vehicular communications. The scope encompasses all types of communications involving vehicles, including vehicle–to–vehicle and vehicle–to–infrastructure. The scope includes (but not limited to) the following topics related to vehicular communications:
Vehicle to vehicle and vehicle to infrastructure communications
Channel modelling, modulating and coding
Congestion Control and scalability issues
Protocol design, testing and verification
Routing in vehicular networks
Security issues and countermeasures
Deployment and field testing
Reducing energy consumption and enhancing safety of vehicles
Wireless in–car networks
Data collection and dissemination methods
Mobility and handover issues
Safety and driver assistance applications
UAV
Underwater communications
Autonomous cooperative driving
Social networks
Internet of vehicles
Standardization of protocols.