Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius
{"title":"Putting Data at the Centre of Offline Multi-Agent Reinforcement Learning","authors":"Claude Formanek, Louise Beyers, Callum Rhys Tilbury, Jonathan P. Shock, Arnu Pretorius","doi":"arxiv-2409.12001","DOIUrl":"https://doi.org/arxiv-2409.12001","url":null,"abstract":"Offline multi-agent reinforcement learning (MARL) is an exciting direction of\u0000research that uses static datasets to find optimal control policies for\u0000multi-agent systems. Though the field is by definition data-driven, efforts\u0000have thus far neglected data in their drive to achieve state-of-the-art\u0000results. We first substantiate this claim by surveying the literature, showing\u0000how the majority of works generate their own datasets without consistent\u0000methodology and provide sparse information about the characteristics of these\u0000datasets. We then show why neglecting the nature of the data is problematic,\u0000through salient examples of how tightly algorithmic performance is coupled to\u0000the dataset used, necessitating a common foundation for experiments in the\u0000field. In response, we take a big step towards improving data usage and data\u0000awareness in offline MARL, with three key contributions: (1) a clear guideline\u0000for generating novel datasets; (2) a standardisation of over 80 existing\u0000datasets, hosted in a publicly available repository, using a consistent storage\u0000format and easy-to-use API; and (3) a suite of analysis tools that allow us to\u0000understand these datasets better, aiding further development.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang
{"title":"HARP: Human-Assisted Regrouping with Permutation Invariant Critic for Multi-Agent Reinforcement Learning","authors":"Huawen Hu, Enze Shi, Chenxi Yue, Shuocun Yang, Zihao Wu, Yiwei Li, Tianyang Zhong, Tuo Zhang, Tianming Liu, Shu Zhang","doi":"arxiv-2409.11741","DOIUrl":"https://doi.org/arxiv-2409.11741","url":null,"abstract":"Human-in-the-loop reinforcement learning integrates human expertise to\u0000accelerate agent learning and provide critical guidance and feedback in complex\u0000fields. However, many existing approaches focus on single-agent tasks and\u0000require continuous human involvement during the training process, significantly\u0000increasing the human workload and limiting scalability. In this paper, we\u0000propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a\u0000multi-agent reinforcement learning framework designed for group-oriented tasks.\u0000HARP integrates automatic agent regrouping with strategic human assistance\u0000during deployment, enabling and allowing non-experts to offer effective\u0000guidance with minimal intervention. During training, agents dynamically adjust\u0000their groupings to optimize collaborative task completion. When deployed, they\u0000actively seek human assistance and utilize the Permutation Invariant Group\u0000Critic to evaluate and refine human-proposed groupings, allowing non-expert\u0000users to contribute valuable suggestions. In multiple collaboration scenarios,\u0000our approach is able to leverage limited guidance from non-experts and enhance\u0000performance. The project can be found at https://github.com/huawen-hu/HARP.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub
{"title":"On-policy Actor-Critic Reinforcement Learning for Multi-UAV Exploration","authors":"Ali Moltajaei Farid, Jafar Roshanian, Malek Mouhoub","doi":"arxiv-2409.11058","DOIUrl":"https://doi.org/arxiv-2409.11058","url":null,"abstract":"Unmanned aerial vehicles (UAVs) have become increasingly popular in various\u0000fields, including precision agriculture, search and rescue, and remote sensing.\u0000However, exploring unknown environments remains a significant challenge. This\u0000study aims to address this challenge by utilizing on-policy Reinforcement\u0000Learning (RL) with Proximal Policy Optimization (PPO) to explore the {two\u0000dimensional} area of interest with multiple UAVs. The UAVs will avoid collision\u0000with obstacles and each other and do the exploration in a distributed manner.\u0000The proposed solution includes actor-critic networks using deep convolutional\u0000neural networks {(CNN)} and long short-term memory (LSTM) for identifying the\u0000UAVs and areas that have already been covered. Compared to other RL techniques,\u0000such as policy gradient (PG) and asynchronous advantage actor-critic (A3C), the\u0000simulation results demonstrate the superiority of the proposed PPO approach.\u0000Also, the results show that combining LSTM with CNN in critic can improve\u0000exploration. Since the proposed exploration has to work in unknown\u0000environments, the results showed that the proposed setup can complete the\u0000coverage when we have new maps that differ from the trained maps. Finally, we\u0000showed how tuning hyper parameters may affect the overall performance.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark","authors":"Zachary S. Siegel, Sayash Kapoor, Nitya Nagdir, Benedikt Stroebl, Arvind Narayanan","doi":"arxiv-2409.11363","DOIUrl":"https://doi.org/arxiv-2409.11363","url":null,"abstract":"AI agents have the potential to aid users on a variety of consequential\u0000tasks, including conducting scientific research. To spur the development of\u0000useful agents, we need benchmarks that are challenging, but more crucially,\u0000directly correspond to real-world tasks of interest. This paper introduces such\u0000a benchmark, designed to measure the accuracy of AI agents in tackling a\u0000crucial yet surprisingly challenging aspect of scientific research:\u0000computational reproducibility. This task, fundamental to the scientific\u0000process, involves reproducing the results of a study using the provided code\u0000and data. We introduce CORE-Bench (Computational Reproducibility Agent\u0000Benchmark), a benchmark consisting of 270 tasks based on 90 scientific papers\u0000across three disciplines (computer science, social science, and medicine).\u0000Tasks in CORE-Bench consist of three difficulty levels and include both\u0000language-only and vision-language tasks. We provide an evaluation system to\u0000measure the accuracy of agents in a fast and parallelizable way, saving days of\u0000evaluation time for each run compared to a sequential implementation. We\u0000evaluated two baseline agents: the general-purpose AutoGPT and a task-specific\u0000agent called CORE-Agent. We tested both variants using two underlying language\u0000models: GPT-4o and GPT-4o-mini. The best agent achieved an accuracy of 21% on\u0000the hardest task, showing the vast scope for improvement in automating routine\u0000scientific tasks. Having agents that can reproduce existing work is a necessary\u0000step towards building agents that can conduct novel research and could verify\u0000and improve the performance of other research agents. We hope that CORE-Bench\u0000can improve the state of reproducibility and spur the development of future\u0000research agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"55 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bearing-Distance Based Flocking with Zone-Based Interactions","authors":"Hossein B. Jond","doi":"arxiv-2409.10047","DOIUrl":"https://doi.org/arxiv-2409.10047","url":null,"abstract":"This paper presents a novel zone-based flocking control approach suitable for\u0000dynamic multi-agent systems (MAS). Inspired by Reynolds behavioral rules for\u0000$boids$, flocking behavioral rules with the zones of repulsion, conflict,\u0000attraction, and surveillance are introduced. For each agent, using only bearing\u0000and distance measurements, behavioral deviation vectors quantify the deviations\u0000from the local separation, local and global flock velocity alignment, local\u0000cohesion, obstacle avoidance and boundary conditions, and strategic separation\u0000for avoiding alien agents. The control strategy uses the local perception-based\u0000behavioral deviation vectors to guide each agent's motion. Additionally, the\u0000control strategy incorporates a directionally-aware obstacle avoidance\u0000mechanism that prioritizes obstacles in the agent's forward path. Simulation\u0000results validate the effectiveness of this approach in creating flexible,\u0000adaptable, and scalable flocking behavior.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Context-aware Advertisement Modeling and Applications in Rapid Transit Systems","authors":"Afzal Ahmed, Muhammad Raees","doi":"arxiv-2409.09956","DOIUrl":"https://doi.org/arxiv-2409.09956","url":null,"abstract":"In today's businesses, marketing has been a central trend for growth.\u0000Marketing quality is equally important as product quality and relevant metrics.\u0000Quality of Marketing depends on targeting the right person. Technology\u0000adaptations have been slow in many fields but have captured some aspects of\u0000human life to make an impact. For instance, in marketing, recent developments\u0000have provided a significant shift toward data-driven approaches. In this paper,\u0000we present an advertisement model using behavioral and tracking analysis. We\u0000extract users' behavioral data upholding their privacy principle and perform\u0000data manipulations and pattern mining for effective analysis. We present a\u0000model using the agent-based modeling (ABM) technique, with the target audience\u0000of rapid transit system users to target the right person for advertisement\u0000applications. We also outline the Overview, Design, and Details concept of ABM.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-agent Path Finding in Continuous Environment","authors":"Kristýna Janovská, Pavel Surynek","doi":"arxiv-2409.10680","DOIUrl":"https://doi.org/arxiv-2409.10680","url":null,"abstract":"We address a variant of multi-agent path finding in continuous environment\u0000(CE-MAPF), where agents move along sets of smooth curves. Collisions between\u0000agents are resolved via avoidance in the space domain. A new Continuous\u0000Environment Conflict-Based Search (CE-CBS) algorithm is proposed in this work.\u0000CE-CBS combines conflict-based search (CBS) for the high-level search framework\u0000with RRT* for low-level path planning. The CE-CBS algorithm is tested under\u0000various settings on diverse CE-MAPF instances. Experimental results show that\u0000CE-CBS is competitive w.r.t. to other algorithms that consider continuous\u0000aspect in MAPF such as MAPF with continuous time.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eden Hartman, Yonatan Aumann, Avinatan Hassidim, Erel Segal-Halevi
{"title":"Reducing Leximin Fairness to Utilitarian Optimization","authors":"Eden Hartman, Yonatan Aumann, Avinatan Hassidim, Erel Segal-Halevi","doi":"arxiv-2409.10395","DOIUrl":"https://doi.org/arxiv-2409.10395","url":null,"abstract":"Two prominent objectives in social choice are utilitarian - maximizing the\u0000sum of agents' utilities, and leximin - maximizing the smallest agent's\u0000utility, then the second-smallest, etc. Utilitarianism is typically\u0000computationally easier to attain but is generally viewed as less fair. This\u0000paper presents a general reduction scheme that, given a utilitarian solver,\u0000produces a distribution over outcomes that is leximin in expectation.\u0000Importantly, the scheme is robust in the sense that, given an approximate\u0000utilitarian solver, it produces an outcome that is approximately-leximin (in\u0000expectation) - with the same approximation factor. We apply our scheme to\u0000several social choice problems: stochastic allocations of indivisible goods,\u0000giveaway lotteries, and fair lotteries for participatory budgeting.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Decentralized Safe and Scalable Multi-Agent Control under Limited Actuation","authors":"Vrushabh Zinage, Abhishek Jha, Rohan Chandra, Efstathios Bakolas","doi":"arxiv-2409.09573","DOIUrl":"https://doi.org/arxiv-2409.09573","url":null,"abstract":"To deploy safe and agile robots in cluttered environments, there is a need to\u0000develop fully decentralized controllers that guarantee safety, respect\u0000actuation limits, prevent deadlocks, and scale to thousands of agents. Current\u0000approaches fall short of meeting all these goals: optimization-based methods\u0000ensure safety but lack scalability, while learning-based methods scale but do\u0000not guarantee safety. We propose a novel algorithm to achieve safe and scalable\u0000control for multiple agents under limited actuation. Specifically, our approach\u0000includes: $(i)$ learning a decentralized neural Integral Control Barrier\u0000function (neural ICBF) for scalable, input-constrained control, $(ii)$\u0000embedding a lightweight decentralized Model Predictive Control-based Integral\u0000Control Barrier Function (MPC-ICBF) into the neural network policy to ensure\u0000safety while maintaining scalability, and $(iii)$ introducing a novel method to\u0000minimize deadlocks based on gradient-based optimization techniques from machine\u0000learning to address local minima in deadlocks. Our numerical simulations show\u0000that this approach outperforms state-of-the-art multi-agent control algorithms\u0000in terms of safety, input constraint satisfaction, and minimizing deadlocks.\u0000Additionally, we demonstrate strong generalization across scenarios with\u0000varying agent counts, scaling up to 1000 agents.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142258628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Nudges for Conditional Cooperation: A Multi-Agent Reinforcement Learning Model","authors":"Shatayu Kulkarni, Sabine Brunswicker","doi":"arxiv-2409.09509","DOIUrl":"https://doi.org/arxiv-2409.09509","url":null,"abstract":"The public goods game describes a social dilemma in which a large proportion\u0000of agents act as conditional cooperators (CC): they only act cooperatively if\u0000they see others acting cooperatively because they satisfice with the social\u0000norm to be in line with what others are doing instead of optimizing\u0000cooperation. CCs are guided by aspiration-based reinforcement learning guided\u0000by past experiences of interactions with others and satisficing aspirations. In\u0000many real-world settings, reinforcing social norms do not emerge. In this\u0000paper, we propose that an optimizing reinforcement agent can facilitate\u0000cooperation through nudges, i.e. indirect mechanisms for cooperation to happen.\u0000The agent's goal is to motivate CCs into cooperation through its own actions to\u0000create social norms that signal that others are cooperating. We introduce a\u0000multi-agent reinforcement learning model for public goods games, with 3 CC\u0000learning agents using aspirational reinforcement learning and 1 nudging agent\u0000using deep reinforcement learning to learn nudges that optimize cooperation.\u0000For our nudging agent, we model two distinct reward functions, one maximizing\u0000the total game return (sum DRL) and one maximizing the number of cooperative\u0000contributions contributions higher than a proportional threshold (prop DRL).\u0000Our results show that our aspiration-based RL model for CC agents is consistent\u0000with empirically observed CC behavior. Games combining 3 CC RL agents and one\u0000nudging RL agent outperform the baseline consisting of 4 CC RL agents only. The\u0000sum DRL nudging agent increases the total sum of contributions by 8.22% and the\u0000total proportion of cooperative contributions by 12.42%, while the prop nudging\u0000DRL increases the total sum of contributions by 8.85% and the total proportion\u0000of cooperative contributions by 14.87%. Our findings advance the literature on\u0000public goods games and reinforcement learning.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"208 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}