Niclas Boehmer, Robert Bredereck, Dušan Knop, Junjie Luo
{"title":"Fine-grained view on bribery for group identification","authors":"Niclas Boehmer, Robert Bredereck, Dušan Knop, Junjie Luo","doi":"10.1007/s10458-023-09597-7","DOIUrl":"10.1007/s10458-023-09597-7","url":null,"abstract":"<div><p>Given a set of agents qualifying or disqualifying each other, group identification is the task of identifying a <i>socially qualified</i> subgroup of agents. Social qualification depends on the specific rule used to aggregate individual qualifications . The classical bribery problem in this context asks how many agents need to change their qualifications in order to change the outcome in a certain way. Complementing previous results showing polynomial-time solvability or NP-hardness of bribery for various social rules in the constructive (aiming at making specific agents socially qualified) or destructive (aiming at making specific agents socially disqualified) setting, we provide a comprehensive picture of the parameterized computational complexity landscape. Conceptually, we also consider a more fine-grained concept of bribery cost, where we ask how many single qualifications need to be changed, nonunit prices for different bribery actions, and a more general bribery goal that combines the constructive and destructive setting.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09597-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48417925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liangde Tao, Lin Chen, Lei Xu, Shouhuai Xu, Zhimin Gao, Weidong Shi
{"title":"Electoral manipulation via influence: probabilistic model","authors":"Liangde Tao, Lin Chen, Lei Xu, Shouhuai Xu, Zhimin Gao, Weidong Shi","doi":"10.1007/s10458-023-09602-z","DOIUrl":"10.1007/s10458-023-09602-z","url":null,"abstract":"<div><p>We consider a natural generalization of the fundamental electoral manipulation problem, where a briber can change the opinion or preference of voters through influence. This is motivated by modern political campaigns where candidates try to convince voters through media such as TV, newspaper, Internet. Compared with the classical bribery problem, we do not assume the briber will directly exchange money for votes from individual voters, but rather assume that the briber has a set of potential campaign strategies. Each campaign strategy represents some way of casting influence on voters. A campaign strategy has some cost and can influence a subset of voters. If a voter belongs to the audience of a campaign strategy, then he/she will be influenced. A voter will be more likely to change his/her opinion/preference if he/she has received influence from a larger number of campaign strategies. We model this through an independent activation model which is widely adopted in social science research and study the computational complexity. In this paper, we give a full characterization by showing NP-hardness results and establishing a near-optimal fixed-parameter tractable algorithm that gives a solution arbitrarily close to the optimal solution.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49650240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantifying over information change with common knowledge","authors":"Thomas Ågotnes, Rustam Galimullin","doi":"10.1007/s10458-023-09601-0","DOIUrl":"10.1007/s10458-023-09601-0","url":null,"abstract":"<div><p>Public announcement logic (PAL) extends multi-agent epistemic logic with dynamic operators modelling the effects of public communication. Allowing quantification over public announcements lets us reason about the <i>existence</i> of an announcement that reaches a certain epistemic goal. Two notable examples of logics of quantified announcements are arbitrary public announcement logic (APAL) and group announcement logic (GAL). While the notion of common knowledge plays an important role in PAL, and in particular in characterisations of epistemic states that an agent or a group of agents might make come about by performing public announcements, extensions of APAL and GAL with common knowledge still haven’t been studied in detail. That is what we do in this paper. In particular, we consider both conservative extensions, where the semantics of the quantifiers is not changed, as well as extensions where the scope of quantification also includes common knowledge formulas. We compare the expressivity of these extensions relative to each other and other connected logics, and provide sound and complete axiomatisations. Finally, we show how the completeness results can be used for other logics with quantification over information change.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09601-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42797156","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanqiang Yu, Peng Zhang, Kai Zhao, Yan Zheng, Jianye Hao
{"title":"Accelerating deep reinforcement learning via knowledge-guided policy network","authors":"Yuanqiang Yu, Peng Zhang, Kai Zhao, Yan Zheng, Jianye Hao","doi":"10.1007/s10458-023-09600-1","DOIUrl":"10.1007/s10458-023-09600-1","url":null,"abstract":"<div><p>Deep reinforcement learning has contributed to dramatic advances in many tasks, such as playing games, controlling robots, and navigating complex environments. However, it requires many interactions with the environment. This is different from the human learning process since humans can use prior knowledge, which can significantly speed up the learning process as it avoids unnecessary exploration. Previous works integrating knowledge in RL did not model uncertainty in human cognition, which reduces the reliability of knowledge. In this paper, we propose a knowledge-guided policy network, a novel framework that combines suboptimal human knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller representing human knowledge and a refined module to fine-tune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing reinforcement learning algorithms such as PPO, AC, and SAC. We conduct experiments on both discrete and continuous control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, significantly improves the learning efficiency of basic RL algorithms, even with very low-performance human prior knowledge. Additional experiments are conducted on the number of fuzzy rules and the interpretability of the policy, which make our proposed framework more complete and reasonable. The code for this research is released under the project page of https://github.com/yuyuanq/reinforcement-learning-using-knowledge-controller.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48085429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sirin Botan, Ronald de Haan, Marija Slavkovik, Zoi Terzopoulou
{"title":"Egalitarian judgment aggregation","authors":"Sirin Botan, Ronald de Haan, Marija Slavkovik, Zoi Terzopoulou","doi":"10.1007/s10458-023-09598-6","DOIUrl":"10.1007/s10458-023-09598-6","url":null,"abstract":"<div><p>Egalitarian considerations play a central role in many areas of social choice theory. Applications of egalitarian principles range from ensuring everyone gets an equal share of a cake when deciding how to divide it, to guaranteeing balance with respect to gender or ethnicity in committee elections. Yet, the egalitarian approach has received little attention in judgment aggregation—a powerful framework for aggregating logically interconnected issues. We make the first steps towards filling that gap. We introduce axioms capturing two classical interpretations of egalitarianism in judgment aggregation and situate these within the context of existing axioms in the pertinent framework of belief merging. We then explore the relationship between these axioms and several notions of strategyproofness from social choice theory at large. Finally, a novel egalitarian judgment aggregation rule stems from our analysis; we present complexity results concerning both outcome determination and strategic manipulation for that rule.\u0000</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-023-09598-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41366361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang
{"title":"Online Markov decision processes with non-oblivious strategic adversary","authors":"Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang","doi":"10.1007/s10458-023-09599-5","DOIUrl":"10.1007/s10458-023-09599-5","url":null,"abstract":"<div><p>We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a <i>non-oblivious</i> strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of <span>({mathcal {O}}(sqrt{T log (L)}+tau ^2sqrt{ T log (vert A vert )}))</span> where <i>L</i> is the size of adversary’s pure strategy set and <span>(vert A vert)</span> denotes the size of agent’s action space.Considering real-world games where the support size of a NE is small, we further propose a new algorithm: <i>MDP-Online Oracle Expert</i> (MDP-OOE), that achieves a policy regret bound of <span>({mathcal {O}}(sqrt{Tlog (L)}+tau ^2sqrt{ T k log (k)}))</span> where <i>k</i> depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence to a NE result. To our best knowledge, this is the first work leading to the last iteration result in OMDPs.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2023-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43638671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Changxi Zhu, Yi Cai, Shuyue Hu, Ho-fung Leung, Dickson K. W. Chiu
{"title":"Learning by reusing previous advice: a memory-based teacher–student framework","authors":"Changxi Zhu, Yi Cai, Shuyue Hu, Ho-fung Leung, Dickson K. W. Chiu","doi":"10.1007/s10458-022-09595-1","DOIUrl":"10.1007/s10458-022-09595-1","url":null,"abstract":"<div><p>Reinforcement Learning (RL) has been widely used to solve sequential decision-making problems. However, it often suffers from slow learning speed in complex scenarios. Teacher–student frameworks address this issue by enabling agents to ask for and give advice so that a student agent can leverage the knowledge of a teacher agent to facilitate its learning. In this paper, we consider the effect of reusing previous advice, and propose a novel memory-based teacher–student framework such that student agents can memorize and reuse the previous advice from teacher agents. In particular, we propose two methods to decide whether previous advice should be reused: <i>Q-Change per Step</i> that reuses the advice if it leads to an increase in Q-values, and <i>Decay Reusing Probability</i> that reuses the advice with a decaying probability. The experiments on diverse RL tasks (Mario, Predator–Prey and Half Field Offense) confirm that our proposed framework significantly outperforms the existing frameworks in which previous advice is not reused.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43862900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A refined complexity analysis of fair districting over graphs","authors":"Niclas Boehmer, Tomohiro Koana, Rolf Niedermeier","doi":"10.1007/s10458-022-09594-2","DOIUrl":"10.1007/s10458-022-09594-2","url":null,"abstract":"<div><p>We study the NP-hard <span>Fair Connected Districting</span> problem recently proposed by Stoica et al. [AAMAS 2020]: Partition a vertex-colored graph into <i>k</i> connected components (subsequently referred to as districts) so that in every district the most frequent color occurs at most a given number of times more often than the second most frequent color. <span>Fair Connected Districting</span> is motivated by various real-world scenarios where agents of different types, which are one-to-one represented by nodes in a network, have to be partitioned into disjoint districts. Herein, one strives for “fair districts” without any type being in a dominating majority in any of the districts. This is to e.g. prevent segregation or political domination of some political party. We conduct a fine-grained analysis of the (parameterized) computational complexity of <span>Fair Connected Districting</span>. In particular, we prove that it is polynomial-time solvable on paths, cycles, stars, and caterpillars, but already becomes NP-hard on trees. Motivated by the latter negative result, we perform a parameterized complexity analysis with respect to various graph parameters including treewidth, and problem-specific parameters, including, the numbers of colors and districts. We obtain a rich and diverse, close to complete picture of the corresponding parameterized complexity landscape (that is, a classification along the complexity classes FPT, XP, W[1]-hard, and para-NP-hard).</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-022-09594-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48797981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preference-based multi-objective multi-agent path finding","authors":"Florence Ho, Shinji Nakadai","doi":"10.1007/s10458-022-09593-3","DOIUrl":"10.1007/s10458-022-09593-3","url":null,"abstract":"<div><p>Multi-Agent Path Finding (MAPF) is a well-studied problem that aims to generate collision-free paths for multiple agents while optimizing a single objective. However, many real-world applications require the consideration of multiple objectives. In this paper, we address a novel extension of MAPF, Multi-Objective MAPF (MOMAPF), that aims to optimize multiple given objectives while computing collision-free paths for all agents. In particular, we aim to incorporate the preferences of a decision maker over multi-agent path planning. Thus, we propose to solve a scalarized MOMAPF, whereby the given preferences of a decision maker are reflected by a weight value associated to each given objective and all weighted objectives are combined into one scalar. We introduce a solver for scalarized MOMAPF based on Conflict-Based Search (CBS) that incorporates an adapted path planner based on an evolutionary algorithm, the Genetic Algorithm (GA). We also introduce three practical objectives to consider in path planning: efficiency, safety, and smoothness. We evaluate the performance of our proposed method in function of the input parameters of GA on experimental simulations and we analyze its efficiency in providing conflict-free solutions within a fixed time.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-022-09593-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45269090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin J. Smith, Robert Klassert, Roland Pihlakas
{"title":"Using soft maximin for risk averse multi-objective decision-making","authors":"Benjamin J. Smith, Robert Klassert, Roland Pihlakas","doi":"10.1007/s10458-022-09586-2","DOIUrl":"10.1007/s10458-022-09586-2","url":null,"abstract":"<div><p>Balancing multiple competing and conflicting objectives is an essential task for any artificial intelligence tasked with satisfying human values or preferences. Conflict arises both from misalignment between individuals with competing values, but also between conflicting value systems held by a single human. Starting with principle of loss-aversion, we designed a set of soft maximin function approaches to multi-objective decision-making. Bench-marking these functions in a set of previously-developed environments, we found that one new approach in particular, ‘split-function exp-log loss aversion’ (SFELLA), learns faster than the state of the art thresholded alignment objective method Vamplew (Engineering Applications of Artificial Intelligenceg 100:104186, 2021) on three of four tasks it was tested on, and achieved the same optimal performance after learning. SFELLA also showed relative robustness improvements against changes in objective scale, which may highlight an advantage dealing with distribution shifts in the environment dynamics. We further compared SFELLA to the multi-objective reward exponentials (MORE) approach, and found that SFELLA performs similarly to MORE in a simple previously-described foraging task, but in a modified foraging environment with a new resource that was not depleted as the agent worked, SFELLA collected more of the new resource with very little cost incurred in terms of the old resource. Overall, we found SFELLA useful for avoiding problems that sometimes occur with a thresholded approach, and more reward-responsive than MORE while retaining its conservative, loss-averse incentive structure.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-022-09586-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47449521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}