LSTM-Enhanced TD3 and Behavior Cloning for UAV Trajectory Tracking Control.

IF 3.9 3区医学 Q1 ENGINEERING, MULTIDISCIPLINARY

Biomimetics Pub Date : 2025-09-04 DOI:10.3390/biomimetics10090591

Yuanhang Qi, Jintao Hu, Fujie Wang, Gewen Huang

{"title":"LSTM-Enhanced TD3 and Behavior Cloning for UAV Trajectory Tracking Control.","authors":"Yuanhang Qi, Jintao Hu, Fujie Wang, Gewen Huang","doi":"10.3390/biomimetics10090591","DOIUrl":null,"url":null,"abstract":"<p><p>Unmanned aerial vehicles (UAVs) often face significant challenges in trajectory tracking within complex dynamic environments, where uncertainties, external disturbances, and nonlinear dynamics hinder accurate and stable control. To address this issue, a bio-inspired deep reinforcement learning (DRL) algorithm is proposed, integrating behavior cloning (BC) and long short-term memory (LSTM) networks. This method can achieve autonomous learning of high-precision control policy without establishing an accurate system dynamics model. Motivated by the memory and prediction functions of biological neural systems, an LSTM module is embedded into the policy network of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. This structure captures temporal state patterns more effectively, enhancing adaptability to trajectory variations and resilience to delays or disturbances. Compared to memoryless networks, the LSTM-based design better replicates biological time-series processing, improving tracking stability and accuracy. In addition, behavior cloning is employed to pre-train the DRL policy using expert demonstrations, mimicking the way animals learn from observation. This biomimetic plausible initialization accelerates convergence by reducing inefficient early-stage exploration. By combining offline imitation with online learning, the TD3-LSTM-BC framework balances expert guidance and adaptive optimization, analogous to innate and experience-based learning in nature. Simulation experimental results confirm the superior robustness and tracking accuracy of the proposed method, demonstrating its potential as a control solution for autonomous UAVs.</p>","PeriodicalId":8907,"journal":{"name":"Biomimetics","volume":"10 9","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12467034/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomimetics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3390/biomimetics10090591","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Unmanned aerial vehicles (UAVs) often face significant challenges in trajectory tracking within complex dynamic environments, where uncertainties, external disturbances, and nonlinear dynamics hinder accurate and stable control. To address this issue, a bio-inspired deep reinforcement learning (DRL) algorithm is proposed, integrating behavior cloning (BC) and long short-term memory (LSTM) networks. This method can achieve autonomous learning of high-precision control policy without establishing an accurate system dynamics model. Motivated by the memory and prediction functions of biological neural systems, an LSTM module is embedded into the policy network of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. This structure captures temporal state patterns more effectively, enhancing adaptability to trajectory variations and resilience to delays or disturbances. Compared to memoryless networks, the LSTM-based design better replicates biological time-series processing, improving tracking stability and accuracy. In addition, behavior cloning is employed to pre-train the DRL policy using expert demonstrations, mimicking the way animals learn from observation. This biomimetic plausible initialization accelerates convergence by reducing inefficient early-stage exploration. By combining offline imitation with online learning, the TD3-LSTM-BC framework balances expert guidance and adaptive optimization, analogous to innate and experience-based learning in nature. Simulation experimental results confirm the superior robustness and tracking accuracy of the proposed method, demonstrating its potential as a control solution for autonomous UAVs.

Abstract Image

查看原文本刊更多论文

无人机轨迹跟踪控制的lstm增强TD3与行为克隆。

在复杂的动态环境中，无人机在轨迹跟踪方面经常面临重大挑战，其中不确定性、外部干扰和非线性动力学阻碍了精确和稳定的控制。为了解决这一问题，提出了一种生物启发的深度强化学习（DRL）算法，该算法将行为克隆（BC）和长短期记忆（LSTM）网络相结合。该方法无需建立精确的系统动力学模型，即可实现高精度控制策略的自主学习。基于生物神经系统的记忆和预测功能，在双延迟深度确定性策略梯度（TD3）算法的策略网络中嵌入LSTM模块。这种结构更有效地捕获时间状态模式，增强了对轨迹变化的适应性和对延迟或干扰的弹性。与无记忆网络相比，基于lstm的设计更好地复制了生物时间序列处理，提高了跟踪的稳定性和准确性。此外，使用专家示范，模仿动物从观察中学习的方式，使用行为克隆对DRL策略进行预训练。这种仿生似是而非的初始化通过减少低效的早期探索来加速收敛。通过将离线模仿与在线学习相结合，TD3-LSTM-BC框架平衡了专家指导和自适应优化，类似于本质上的先天学习和基于经验的学习。仿真实验结果证实了该方法具有良好的鲁棒性和跟踪精度，显示了其作为自主无人机控制解决方案的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊