{"title":"复杂环境下无人机跟踪的改进近端策略优化","authors":"Tao Zhang , Qingyan Zhou , Yue Zheng , Huiwen Yu","doi":"10.1016/j.knosys.2025.113627","DOIUrl":null,"url":null,"abstract":"<div><div>Unmanned Aerial Vehicles (UAVs) operating in urban environments face critical challenges in dynamic field of view (FOV) management and obstacle avoidance. To address these issues, this paper proposes an improved Proximal Policy Optimization algorithm (I-PPO) that integrates seven key enhancements, including reward scaling, gradient clip, and others. This algorithm improves sample efficiency and reduces policy oscillation in complex environments, in which we have developed a three-dimensional simulation environment capable of multi-terrain parametric modeling that integrates weather-related FOV attenuation models and intelligent dynamic obstacle modules. Focusing on the tracking task, the study designs a reward function based on a hierarchical penalty system and priority rules. This approach ensures operational safety while maximizing target vehicle visibility, thereby optimizing agent performance under environmental uncertainties. Experimental results demonstrate that in plain environments, I-PPO yields a 2.9-fold increase in mean cumulative reward and extends target tracking duration by a factor of 2.7 compared to the standard PPO. In hilly terrain, I-PPO maintains reward performance comparable to its plain environment baseline, exhibiting merely a 2% performance degradation, confirming terrain adaptability. In mountainous terrain, while it shows a 12% reward reduction versus hilly terrain, it exhibits a 38.9% reduction in reward variance (measured by IQR) compared to Discrete Soft Actor–Critic (DSAC), this demonstrates significant robustness enhancement. In scenarios with 10 intelligent dynamic obstacles, the algorithm achieves stable convergence within 984 time units and demonstrates equivalent robustness under weather-induced FOV attenuation across multi-terrain environments. Furthermore, Theoretical analysis confirms the method’s compliance with policy gradient convergence requirements.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"319 ","pages":"Article 113627"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improved proximal policy optimization for UAV tracking in complex environments\",\"authors\":\"Tao Zhang , Qingyan Zhou , Yue Zheng , Huiwen Yu\",\"doi\":\"10.1016/j.knosys.2025.113627\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Unmanned Aerial Vehicles (UAVs) operating in urban environments face critical challenges in dynamic field of view (FOV) management and obstacle avoidance. To address these issues, this paper proposes an improved Proximal Policy Optimization algorithm (I-PPO) that integrates seven key enhancements, including reward scaling, gradient clip, and others. This algorithm improves sample efficiency and reduces policy oscillation in complex environments, in which we have developed a three-dimensional simulation environment capable of multi-terrain parametric modeling that integrates weather-related FOV attenuation models and intelligent dynamic obstacle modules. Focusing on the tracking task, the study designs a reward function based on a hierarchical penalty system and priority rules. This approach ensures operational safety while maximizing target vehicle visibility, thereby optimizing agent performance under environmental uncertainties. Experimental results demonstrate that in plain environments, I-PPO yields a 2.9-fold increase in mean cumulative reward and extends target tracking duration by a factor of 2.7 compared to the standard PPO. In hilly terrain, I-PPO maintains reward performance comparable to its plain environment baseline, exhibiting merely a 2% performance degradation, confirming terrain adaptability. In mountainous terrain, while it shows a 12% reward reduction versus hilly terrain, it exhibits a 38.9% reduction in reward variance (measured by IQR) compared to Discrete Soft Actor–Critic (DSAC), this demonstrates significant robustness enhancement. In scenarios with 10 intelligent dynamic obstacles, the algorithm achieves stable convergence within 984 time units and demonstrates equivalent robustness under weather-induced FOV attenuation across multi-terrain environments. Furthermore, Theoretical analysis confirms the method’s compliance with policy gradient convergence requirements.</div></div>\",\"PeriodicalId\":49939,\"journal\":{\"name\":\"Knowledge-Based Systems\",\"volume\":\"319 \",\"pages\":\"Article 113627\"},\"PeriodicalIF\":7.2000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Knowledge-Based Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0950705125006732\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125006732","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Improved proximal policy optimization for UAV tracking in complex environments
Unmanned Aerial Vehicles (UAVs) operating in urban environments face critical challenges in dynamic field of view (FOV) management and obstacle avoidance. To address these issues, this paper proposes an improved Proximal Policy Optimization algorithm (I-PPO) that integrates seven key enhancements, including reward scaling, gradient clip, and others. This algorithm improves sample efficiency and reduces policy oscillation in complex environments, in which we have developed a three-dimensional simulation environment capable of multi-terrain parametric modeling that integrates weather-related FOV attenuation models and intelligent dynamic obstacle modules. Focusing on the tracking task, the study designs a reward function based on a hierarchical penalty system and priority rules. This approach ensures operational safety while maximizing target vehicle visibility, thereby optimizing agent performance under environmental uncertainties. Experimental results demonstrate that in plain environments, I-PPO yields a 2.9-fold increase in mean cumulative reward and extends target tracking duration by a factor of 2.7 compared to the standard PPO. In hilly terrain, I-PPO maintains reward performance comparable to its plain environment baseline, exhibiting merely a 2% performance degradation, confirming terrain adaptability. In mountainous terrain, while it shows a 12% reward reduction versus hilly terrain, it exhibits a 38.9% reduction in reward variance (measured by IQR) compared to Discrete Soft Actor–Critic (DSAC), this demonstrates significant robustness enhancement. In scenarios with 10 intelligent dynamic obstacles, the algorithm achieves stable convergence within 984 time units and demonstrates equivalent robustness under weather-induced FOV attenuation across multi-terrain environments. Furthermore, Theoretical analysis confirms the method’s compliance with policy gradient convergence requirements.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.