Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang
{"title":"基于多智能体强化学习的无人机常见故障多源羽流追踪","authors":"Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang","doi":"10.1016/j.mlwa.2025.100737","DOIUrl":null,"url":null,"abstract":"<div><div>Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"22 ","pages":"Article 100737"},"PeriodicalIF":4.9000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-source plume tracing via multi-agent reinforcement learning under common UAV-faults\",\"authors\":\"Pedro Antonio Alarcon Granadeno, Theodore Chambers, Jane Cleland-Huang\",\"doi\":\"10.1016/j.mlwa.2025.100737\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"22 \",\"pages\":\"Article 100737\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025001203\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multi-source plume tracing via multi-agent reinforcement learning under common UAV-faults
Hazardous airborne gas releases from accidents, leaks, or wildfires require rapid localization of emission sources under uncertain and turbulent conditions. Traditional gradient-based or biologically inspired strategies struggle in multi-source environments where odor cues are intermittent, aliased, and partially observed. We address this challenge by formulating multi-source plume tracing in three-dimensional fields as a cooperative partially observable Markov game. To solve it, we introduce an Action-Specific Double Deep Recurrent Q-Network (ADDRQN) that conditions on action–observation pairs to improve latent-state inference, and integrates teammate information through a permutation-invariant set encoder. Training follows a randomized centralized-training and decentralized-execution regime with host randomization, team-size variation, and noise injection. This yields a policy that is robust to agent failures (hardware malfunction, battery depletion, etc.), resilient to intermittent communication blackouts, and tolerant of sensor noise. Empirical evaluation in simulated Gaussian plume environments shows that ADDRQN achieves higher success rates and shorter localization times than non-action baselines, maintains strong performance under mid-mission disruptions, and scales predictably with team size.