{"title":"An improved elitist-Q-Learning path planning strategy for VTOL air-ground vehicle using convolutional neural network mode prediction","authors":"Jing Zhao, Chao Yang, Weida Wang, Ying Li, Tianqi Qie, Bin Xu","doi":"10.1016/j.aei.2025.103316","DOIUrl":null,"url":null,"abstract":"<div><div>Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":""},"PeriodicalIF":8.0000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625002095","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Vertical take-off and landing (VTOL) air-ground integrated vehicles have received extensive attention in rescue, transportation, and other task fields. To further improve the task efficiency in complex environments such as post-disaster cities and scrubland, this vehicle requires efficient and rational path planning. In above environments, it is difficult to obtain complete and accurate obstacle information. The planning process faces the technical difficulties of using the limited obstacle perception information to switch air-ground modes and fast acquire the optimal planning trajectory with the shortest distance. To address the above issues, this paper proposes an improved elitist-Q-Learning path planning strategy for the VTOL air-ground vehicle using convolutional neural network mode prediction. Firstly, to predict the mode switching actions, a convolutional neural mode prediction network is constructed with local obstacle information as input data. Secondly, based on the above predicted actions, an elitist-Q-Learning (EQL) multi-mode planning algorithm is designed. A new reward function considering the multi-mode actions is proposed. On this basis, heuristic correction and elitist adjusting factors replace the fixed rewards of traditional Q-Learning with dynamically adjusted rewards during the iterative process. The Q table is quickly updated to converge to optimal values. Finally, this proposed strategy is verified in randomly generated maps of 1000 m*1000 m. Results show that the prediction accuracy can be maintained over 93 %. Its path distance is reduced by 4.56 % and 1.75 % compared to that of traditional Q-Learning and A* with mode prediction, respectively. It has the same path distance as BAS-A*, LPA*, and D* Lite with mode prediction. Compared to traditional Q-Learning, it reduces computational time by 36.61 %. When converged, its iterative numbers are 58.9 % less than those of traditional Q-Learning.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.