基于多智能体强化学习的无人机常见故障多源羽流追踪

IF 4.9

Machine learning with applications Pub Date : 2025-09-18 DOI:10.1016/j.mlwa.2025.100737

Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang

{"title":"基于多智能体强化学习的无人机常见故障多源羽流追踪","authors":"Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang","doi":"10.1016/j.mlwa.2025.100737","DOIUrl":null,"url":null,"abstract":"<div><div>Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100737"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-source plume tracing via multi-agent reinforcement learning under common UAV-faults\",\"authors\":\"Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang\",\"doi\":\"10.1016/j.mlwa.2025.100737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"22 \",\"pages\":\"Article 100737\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025001203\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

事故、泄漏或野火释放的有害气体需要在不确定和湍流条件下快速定位排放源。传统的基于梯度或生物启发的策略在多源环境中挣扎，这些环境中的气味线索是间歇性的、混叠的和部分观察到的。我们通过制定多源羽流追踪在三维领域作为一个合作的部分可观察的马尔可夫博弈来解决这一挑战。为了解决这一问题，我们引入了一种特定动作的双深度循环q -网络（ADDRQN），该网络对动作观察对进行条件设置以改进潜在状态推断，并通过排列不变集编码器集成队友信息。训练遵循随机的集中训练和分散执行机制，包括主机随机化、团队规模变化和噪音注入。这产生了一种策略，该策略对代理故障（硬件故障、电池耗尽等）具有鲁棒性，对间歇性通信中断具有弹性，并且能够容忍传感器噪声。在模拟高斯羽流环境中的经验评估表明，与不行动基线相比，ADDRQN获得了更高的成功率和更短的定位时间，在任务中期中断时保持了较强的性能，并且随着团队规模的增加而可预测地扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-source plume tracing via multi-agent reinforcement learning under common UAV-faults

Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days