Alejandro Mendoza Barrionuevo;Samuel Yanes Luis;Daniel Gutiérrez Reina;Sergio L. Toral Marín
{"title":"基于深度强化学习的异构自主水面车辆优化水体塑料垃圾收集","authors":"Alejandro Mendoza Barrionuevo;Samuel Yanes Luis;Daniel Gutiérrez Reina;Sergio L. Toral Marín","doi":"10.1109/LRA.2025.3555940","DOIUrl":null,"url":null,"abstract":"This letter presents a model-free deep reinforcement learning framework for informative path planning with heterogeneous fleets of autonomous surface vehicles to locate and collect plastic waste. The system employs two teams of vehicles: scouts and cleaners. Coordination between these teams is achieved through a deep reinforcement approach, allowing agents to learn strategies to maximize cleaning efficiency. The primary objective is for the scout team to provide an up-to-date contamination model, while the cleaner team collects as much waste as possible following this model. This strategy leads to heterogeneous teams that optimize fleet efficiency through inter-team cooperation supported by a tailored reward function. Different trainings of the proposed algorithm are compared with other state-of-the-art algorithms in three distinct scenarios, one with moderate convexity, another with narrow corridors and challenging access, and the last one larger, more complex and with more difficult to access shape. According to the obtained results, it is demonstrated that deep reinforcement learning based algorithms outperform baselines, exhibiting superior adaptability. In addition, training with examples of actions from other algorithms further improves performance, especially in scenarios where the search space is larger.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 5","pages":"4930-4937"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10945400","citationCount":"0","resultStr":"{\"title\":\"Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles With Deep Reinforcement Learning\",\"authors\":\"Alejandro Mendoza Barrionuevo;Samuel Yanes Luis;Daniel Gutiérrez Reina;Sergio L. Toral Marín\",\"doi\":\"10.1109/LRA.2025.3555940\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This letter presents a model-free deep reinforcement learning framework for informative path planning with heterogeneous fleets of autonomous surface vehicles to locate and collect plastic waste. The system employs two teams of vehicles: scouts and cleaners. Coordination between these teams is achieved through a deep reinforcement approach, allowing agents to learn strategies to maximize cleaning efficiency. The primary objective is for the scout team to provide an up-to-date contamination model, while the cleaner team collects as much waste as possible following this model. This strategy leads to heterogeneous teams that optimize fleet efficiency through inter-team cooperation supported by a tailored reward function. Different trainings of the proposed algorithm are compared with other state-of-the-art algorithms in three distinct scenarios, one with moderate convexity, another with narrow corridors and challenging access, and the last one larger, more complex and with more difficult to access shape. According to the obtained results, it is demonstrated that deep reinforcement learning based algorithms outperform baselines, exhibiting superior adaptability. In addition, training with examples of actions from other algorithms further improves performance, especially in scenarios where the search space is larger.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 5\",\"pages\":\"4930-4937\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10945400\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10945400/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10945400/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
Optimizing Plastic Waste Collection in Water Bodies Using Heterogeneous Autonomous Surface Vehicles With Deep Reinforcement Learning
This letter presents a model-free deep reinforcement learning framework for informative path planning with heterogeneous fleets of autonomous surface vehicles to locate and collect plastic waste. The system employs two teams of vehicles: scouts and cleaners. Coordination between these teams is achieved through a deep reinforcement approach, allowing agents to learn strategies to maximize cleaning efficiency. The primary objective is for the scout team to provide an up-to-date contamination model, while the cleaner team collects as much waste as possible following this model. This strategy leads to heterogeneous teams that optimize fleet efficiency through inter-team cooperation supported by a tailored reward function. Different trainings of the proposed algorithm are compared with other state-of-the-art algorithms in three distinct scenarios, one with moderate convexity, another with narrow corridors and challenging access, and the last one larger, more complex and with more difficult to access shape. According to the obtained results, it is demonstrated that deep reinforcement learning based algorithms outperform baselines, exhibiting superior adaptability. In addition, training with examples of actions from other algorithms further improves performance, especially in scenarios where the search space is larger.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.