Trajectory design for data collection under insufficient UAV energy: A staged actor–critic reinforcement learning approach

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of Systems Architecture Pub Date : 2025-09-15 DOI:10.1016/j.sysarc.2025.103566

Jing Mei , Yuejia Zhang , Zhao Tong , Keqin Li

{"title":"Trajectory design for data collection under insufficient UAV energy: A staged actor–critic reinforcement learning approach","authors":"Jing Mei , Yuejia Zhang , Zhao Tong , Keqin Li","doi":"10.1016/j.sysarc.2025.103566","DOIUrl":null,"url":null,"abstract":"<div><div>Fixed-wing unmanned aerial vehicles (UAVs) offer distinct advantages for large-scale environmental sensor data collection. In forest and marine scenarios, UAVs typically depart from a fixed location, collecting data along a route, and return. Unlike existing work aiming to minimizing energy consumption on data collection task, this study focus on the scenario where a UAV’s initial energy may not be sufficient to visit all sensor nodes. We aim to maximize data collection under insufficient battery energy while make a safety return. To solve this, we adopt the twin delayed deep deterministic policy gradient (TD3) algorithm with three designed reward functions, and introduce a stage-based safe action algorithm, termed Staged Safe-Action TD3 (SS-TD3). An energy consumption model incorporating acceleration and a segmented time model are used to enhance exploration efficiency. To tackle sparse binary rewards and the suboptimal convergence of complex reward function in reinforcement learning, a staged training approach, Staged Actor–Critic based reinforcement Learning (S-ACL) is proposed, as the one of the fundamental component of SS-TD3. Experimental results show that SS-TD3 achieves the best energy efficiency compared to baselines, while S-ACL significantly improves policy performance in complex reward environments.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"168 ","pages":"Article 103566"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125002383","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Fixed-wing unmanned aerial vehicles (UAVs) offer distinct advantages for large-scale environmental sensor data collection. In forest and marine scenarios, UAVs typically depart from a fixed location, collecting data along a route, and return. Unlike existing work aiming to minimizing energy consumption on data collection task, this study focus on the scenario where a UAV’s initial energy may not be sufficient to visit all sensor nodes. We aim to maximize data collection under insufficient battery energy while make a safety return. To solve this, we adopt the twin delayed deep deterministic policy gradient (TD3) algorithm with three designed reward functions, and introduce a stage-based safe action algorithm, termed Staged Safe-Action TD3 (SS-TD3). An energy consumption model incorporating acceleration and a segmented time model are used to enhance exploration efficiency. To tackle sparse binary rewards and the suboptimal convergence of complex reward function in reinforcement learning, a staged training approach, Staged Actor–Critic based reinforcement Learning (S-ACL) is proposed, as the one of the fundamental component of SS-TD3. Experimental results show that SS-TD3 achieves the best energy efficiency compared to baselines, while S-ACL significantly improves policy performance in complex reward environments.

查看原文本刊更多论文

无人机能量不足情况下数据收集的轨迹设计：一种分阶段行为批判强化学习方法

固定翼无人机为大规模环境传感器数据采集提供了明显的优势。在森林和海洋场景中，无人机通常从固定位置出发，沿着路线收集数据，然后返回。与现有的旨在最小化数据收集任务能耗的工作不同，本研究侧重于无人机初始能量可能不足以访问所有传感器节点的情况。我们的目标是在电池电量不足的情况下，最大限度地收集数据，同时获得安全回报。为了解决这一问题，我们采用了具有三个设计奖励函数的双延迟深度确定性策略梯度（TD3）算法，并引入了一种基于阶段的安全动作算法，称为阶段性安全动作TD3 （SS-TD3）。为了提高勘探效率，采用了结合加速度的能量消耗模型和分段时间模型。为了解决强化学习中稀疏二元奖励和复杂奖励函数的次优收敛问题，提出了一种分阶段训练方法——基于分阶段Actor-Critic的强化学习（S-ACL），作为SS-TD3的基本组成部分之一。实验结果表明，与基线相比，SS-TD3获得了最佳的能量效率，而S-ACL在复杂奖励环境下显著提高了策略性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems Architecture 工程技术-计算机：硬件

CiteScore

8.70

自引率

15.60%

发文量

226

审稿时长

46 days

期刊介绍： The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.