Lukasz Janeczko, Jérôme Lang, Grzegorz Lisowski, Stanislaw Szufa
{"title":"Discovering Consistent Subelections","authors":"Lukasz Janeczko, Jérôme Lang, Grzegorz Lisowski, Stanislaw Szufa","doi":"10.5555/3635637.3662948","DOIUrl":"https://doi.org/10.5555/3635637.3662948","url":null,"abstract":"We show how hidden interesting subelections can be discovered in ordinal elections. An interesting subelection consists of a reasonably large set of voters and a reasonably large set of candidates such that the former have a consistent opinion about the latter. Consistency may take various forms but we focus on three: Identity (all selected voters rank all selected candidates the same way), antagonism (half of the selected voters rank candidates in some order and the other half in the reverse order), and clones (all selected voters rank all selected candidates contiguously in the original election). We first study the computation of such hidden subelections. Second, we analyze synthetic and real-life data, and find that identifying hidden consistent subelections allows us to uncover some relevant concepts.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"58 17","pages":"935-943"},"PeriodicalIF":0.0,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141799344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Piotr Faliszewski, Lukasz Janeczko, Andrzej Kaczmarczyk, Grzegorz Lisowski, P. Skowron, Stanislaw Szufa
{"title":"Strategic Cost Selection in Participatory Budgeting","authors":"Piotr Faliszewski, Lukasz Janeczko, Andrzej Kaczmarczyk, Grzegorz Lisowski, P. Skowron, Stanislaw Szufa","doi":"10.5555/3635637.3663125","DOIUrl":"https://doi.org/10.5555/3635637.3663125","url":null,"abstract":"We study strategic behavior of project proposers in the context of approval-based participatory budgeting (PB). In our model we assume that the votes are fixed and known and the proposers want to set as high project prices as possible, provided that their projects get selected and the prices are not below the minimum costs of their delivery. We study the existence of pure Nash equilibria (NE) in such games, focusing on the AV/Cost, Phragm'en, and Method of Equal Shares rules. Furthermore, we report an experimental study of strategic cost selection on real-life PB election data.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"4 10","pages":"2255-2257"},"PeriodicalIF":0.0,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141804386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Koyfman, Shahaf S. Shperberg, Dor Atzmon, Ariel Felner
{"title":"Minimizing State Exploration While Searching Graphs with Unknown Obstacles","authors":"Daniel Koyfman, Shahaf S. Shperberg, Dor Atzmon, Ariel Felner","doi":"10.5555/3635637.3662959","DOIUrl":"https://doi.org/10.5555/3635637.3662959","url":null,"abstract":"We address the challenge of finding a shortest path in a graph with unknown obstacles where the exploration cost to detect whether a state is free or blocked is very high (e.g., due to sensor activation for obstacle detection). The main objective is to solve the problem while minimizing the number of explorations. To achieve this, we propose MXA∗, a novel heuristic search algorithm based on A∗. The key innovation in MXA∗ lies in modifying the heuristic calculation to avoid obstacles that have already been revealed. Furthermore, this paper makes a noteworthy contribution by introducing the concept of a dynamic heuristic. In contrast to the conventional static heuristic, a dynamic heuristic leverages information that emerges during the search process and adapts its estimations accordingly. By employing a dynamic heuristic, we suggest enhancements to MXA∗ based on real-time information obtained from both the open and closed lists. We demonstrate empirically that MXA∗ finds the shortest path while significantly reducing the number of explored states compared to traditional A∗. The code is available at https: //github.com/bernuly1/MXA-Star.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"33 3","pages":"1038-1046"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141232947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan
{"title":"vMFER: von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement of Actor-Critic Algorithms","authors":"Yiwen Zhu, Jinyi Liu, Wenya Wei, Qianyi Fu, Yujing Hu, Zhou Fang, Bo An, Jianye Hao, Tangjie Lv, Changjie Fan","doi":"10.5555/3635637.3663247","DOIUrl":"https://doi.org/10.5555/3635637.3663247","url":null,"abstract":"Reinforcement Learning (RL) is a widely employed technique in decision-making problems, encompassing two fundamental operations -- policy evaluation and policy improvement. Enhancing learning efficiency remains a key challenge in RL, with many efforts focused on using ensemble critics to boost policy evaluation efficiency. However, when using multiple critics, the actor in the policy improvement process can obtain different gradients. Previous studies have combined these gradients without considering their disagreements. Therefore, optimizing the policy improvement process is crucial to enhance learning efficiency. This study focuses on investigating the impact of gradient disagreements caused by ensemble critics on policy improvement. We introduce the concept of uncertainty of gradient directions as a means to measure the disagreement among gradients utilized in the policy improvement process. Through measuring the disagreement among gradients, we find that transitions with lower uncertainty of gradient directions are more reliable in the policy improvement process. Building on this analysis, we propose a method called von Mises-Fisher Experience Resampling (vMFER), which optimizes the policy improvement process by resampling transitions and assigning higher confidence to transitions with lower uncertainty of gradient directions. Our experiments demonstrate that vMFER significantly outperforms the benchmark and is particularly well-suited for ensemble structures in RL.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"60 5","pages":"2621-2623"},"PeriodicalIF":0.0,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140978793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An
{"title":"Reinforcement Nash Equilibrium Solver","authors":"Xinrun Wang, Chang Yang, Shuxin Li, Pengdeng Li, Xiao Huang, Hau Chan, Bo An","doi":"10.5555/3635637.3663224","DOIUrl":"https://doi.org/10.5555/3635637.3663224","url":null,"abstract":"Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as\"inexact solvers\", or\"solvers\"for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as $alpha$-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., $alpha$-rank, CE, FP and PRD, and can be generalized to unseen games.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"57 35","pages":"2552-2554"},"PeriodicalIF":0.0,"publicationDate":"2024-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141009450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Utility of External Agent Intention Predictor for Human-AI Coordination","authors":"Chenxu Wang, Zilong Chen, Huaping Liu","doi":"10.5555/3635637.3663222","DOIUrl":"https://doi.org/10.5555/3635637.3663222","url":null,"abstract":"Reaching a consensus on the team plans is vital to human-AI coordination. Although previous studies provide approaches through communications in various ways, it could still be hard to coordinate when the AI has no explainable plan to communicate. To cover this gap, we suggest incorporating external models to assist humans in understanding the intentions of AI agents. In this paper, we propose a two-stage paradigm that first trains a Theory of Mind (ToM) model from collected offline trajectories of the target agent, and utilizes the model in the process of human-AI collaboration by real-timely displaying the future action predictions of the target agent. Such a paradigm leaves the AI agent as a black box and thus is available for improving any agents. To test our paradigm, we further implement a transformer-based predictor as the ToM model and develop an extended online human-AI collaboration platform for experiments. The comprehensive experimental results verify that human-AI teams can achieve better performance with the help of our model. A user assessment attached to the experiment further demonstrates that our paradigm can significantly enhance the situational awareness of humans. Our study presents the potential to augment the ability of humans via external assistance in human-AI collaboration, which may further inspire future research.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"54 6","pages":"2546-2548"},"PeriodicalIF":0.0,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141016713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure","authors":"Zhicheng Zhang, Yancheng Liang, Yi Wu, Fei Fang","doi":"10.5555/3635637.3663073","DOIUrl":"https://doi.org/10.5555/3635637.3663073","url":null,"abstract":"Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to\"cover\"the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"9 5","pages":"2085-2093"},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141044404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dual Role AoI-based Incentive Mechanism for HD map Crowdsourcing","authors":"Wentao Ye, Bo Liu, Yuan Luo, Jianwei Huang","doi":"10.5555/3635637.3663230","DOIUrl":"https://doi.org/10.5555/3635637.3663230","url":null,"abstract":"A high-quality fresh high-definition (HD) map is vital in enhancing transportation efficiency and safety in autonomous driving. Vehicle-based crowdsourcing offers a promising approach for updating HD maps. However, recruiting crowdsourcing vehicles involves making the challenging tradeoff between the HD map freshness and recruitment costs. Existing studies on HD map crowdsourcing often (1) prioritize maximizing spatial coverage and (2) overlook the dual role of crowdsourcing vehicles in HD maps, as vehicles serve both as contributors and customers of HD maps. This motivates us to propose the Dual-Role Age of Information (AoI) based Incentive Mechanism (DRAIM) to address these issues. % Specifically, we propose the trajectory age of information, incorporating the expected AoI of the HD map and the trajectory, to quantify a vehicle's HD map usage utility, which is freshness- and trajectory-dependent. DRAIM aims to achieve the company's tradeoff between freshness and recruitment costs.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"10 1","pages":"2570-2572"},"PeriodicalIF":0.0,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141025969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Damian Kurpiewski, Witold Pazderski, W. Jamroga, Yan Kim
{"title":"STV+Reductions: Towards Practical Verification of Strategic Ability Using Model Reductions","authors":"Damian Kurpiewski, Witold Pazderski, W. Jamroga, Yan Kim","doi":"10.5555/3463952.3464232","DOIUrl":"https://doi.org/10.5555/3463952.3464232","url":null,"abstract":"We present a substantially expanded version of our tool STV for strategy synthesis and verification of strategic abilities. The new version adds user-definable models and support for model reduction through partial order reduction and checking for bisimulation.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126828967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Chakraborty, Damien Busatto-Gaston, Jean-François Raskin, G. Pérez
{"title":"Formally-Sharp DAgger for MCTS: Lower-Latency Monte Carlo Tree Search using Data Aggregation with Formal Methods","authors":"D. Chakraborty, Damien Busatto-Gaston, Jean-François Raskin, G. Pérez","doi":"10.5555/3545946.3598783","DOIUrl":"https://doi.org/10.5555/3545946.3598783","url":null,"abstract":"We study how to efficiently combine formal methods, Monte Carlo Tree Search (MCTS), and deep learning in order to produce high-quality receding horizon policies in large Markov Decision processes (MDPs). In particular, we use model-checking techniques to guide the MCTS algorithm in order to generate offline samples of high-quality decisions on a representative set of states of the MDP. Those samples can then be used to train a neural network that imitates the policy used to generate them. This neural network can either be used as a guide on a lower-latency MCTS online search, or alternatively be used as a full-fledged policy when minimal latency is required. We use statistical model checking to detect when additional samples are needed and to focus those additional samples on configurations where the learnt neural network policy differs from the (computationally-expensive) offline policy. We illustrate the use of our method on MDPs that model the Frozen Lake and Pac-Man environments -- two popular benchmarks to evaluate reinforcement-learning algorithms.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127035829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}