M. Mashayekhi, Nirav Ajmeri, G. List, Munindar P. Singh
{"title":"Prosocial Norm Emergence in Multi-agent Systems","authors":"M. Mashayekhi, Nirav Ajmeri, G. List, Munindar P. Singh","doi":"10.1145/3540202","DOIUrl":"https://doi.org/10.1145/3540202","url":null,"abstract":"Multi-agent systems provide a basis for developing systems of autonomous entities and thus find application in a variety of domains. We consider a setting where not only the member agents are adaptive but also the multi-agent system viewed as an entity in its own right is adaptive. Specifically, the social structure of a multi-agent system can be reflected in the social norms among its members. It is well recognized that the norms that arise in society are not always beneficial to its members. We focus on prosocial norms, which help achieve positive outcomes for society and often provide guidance to agents to act in a manner that takes into account the welfare of others. Specifically, we propose Cha, a framework for the emergence of prosocial norms. Unlike previous norm emergence approaches, Cha supports continual change to a system (agents may enter and leave) and dynamism (norms may change when the environment changes). Importantly, Cha agents incorporate prosocial decision-making based on inequity aversion theory, reflecting an intuition of guilt arising from being antisocial. In this manner, Cha brings together two important themes in prosociality: decision-making by individuals and fairness of system-level outcomes. We demonstrate via simulation that Cha can improve aggregate societal gains and fairness of outcomes.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123965992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint","authors":"Changxi Zhu, Ho-fung Leung, Shuyue Hu, Yi Cai","doi":"10.1145/3447268","DOIUrl":"https://doi.org/10.1145/3447268","url":null,"abstract":"In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122824670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmad Esmaeili, J. Gallagher, John A. Springer, E. Matson
{"title":"HAMLET: A Hierarchical Agent-based Machine Learning Platform","authors":"Ahmad Esmaeili, J. Gallagher, John A. Springer, E. Matson","doi":"10.1145/3530191","DOIUrl":"https://doi.org/10.1145/3530191","url":null,"abstract":"Hierarchical Multi-agent Systems provide convenient and relevant ways to analyze, model, and simulate complex systems composed of a large number of entities that interact at different levels of abstraction. In this article, we introduce HAMLET (Hierarchical Agent-based Machine LEarning plaTform), a hybrid machine learning platform based on hierarchical multi-agent systems, to facilitate the research and democratization of geographically and/or locally distributed machine learning entities. The proposed system models machine learning solutions as a hypergraph and autonomously sets up a multi-level structure of heterogeneous agents based on their innate capabilities and learned skills. HAMLET aids the design and management of machine learning systems and provides analytical capabilities for research communities to assess the existing and/or new algorithms/datasets through flexible and customizable queries. The proposed hybrid machine learning platform does not assume restrictions on the type of learning algorithms/datasets and is theoretically proven to be sound and complete with polynomial computational requirements. Additionally, it is examined empirically on 120 training and 4 generalized batch testing tasks performed on 24 machine learning algorithms and 9 standard datasets. The provided experimental results not only establish confidence in the platform’s consistency and correctness but also demonstrate its testing and analytical capacity.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129490076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applying Machine Learning in Self-adaptive Systems","authors":"Omid Gheibi, Danny Weyns, Federico Quin","doi":"10.1145/3469440","DOIUrl":"https://doi.org/10.1145/3469440","url":null,"abstract":"Recently, we have been witnessing a rapid increase in the use of machine learning techniques in self-adaptive systems. Machine learning has been used for a variety of reasons, ranging from learning a model of the environment of a system during operation to filtering large sets of possible configurations before analyzing them. While a body of work on the use of machine learning in self-adaptive systems exists, there is currently no systematic overview of this area. Such an overview is important for researchers to understand the state of the art and direct future research efforts. This article reports the results of a systematic literature review that aims at providing such an overview. We focus on self-adaptive systems that are based on a traditional Monitor-Analyze-Plan-Execute (MAPE)-based feedback loop. The research questions are centered on the problems that motivate the use of machine learning in self-adaptive systems, the key engineering aspects of learning in self-adaptation, and open challenges in this area. The search resulted in 6,709 papers, of which 109 were retained for data collection. Analysis of the collected data shows that machine learning is mostly used for updating adaptation rules and policies to improve system qualities, and managing resources to better balance qualities and resources. These problems are primarily solved using supervised and interactive learning with classification, regression, and reinforcement learning as the dominant methods. Surprisingly, unsupervised learning that naturally fits automation is only applied in a small number of studies. Key open challenges in this area include the performance of learning, managing the effects of learning, and dealing with more complex types of goals. From the insights derived from this systematic literature review, we outline an initial design process for applying machine learning in self-adaptive systems that are based on MAPE feedback loops.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126707319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhaopin Su, Guofu Zhang, Feng Yue, Jindong He, M. Li, Bin Li, X. Yao
{"title":"Finding the Largest Successful Coalition under the Strict Goal Preferences of Agents","authors":"Zhaopin Su, Guofu Zhang, Feng Yue, Jindong He, M. Li, Bin Li, X. Yao","doi":"10.1145/3412370","DOIUrl":"https://doi.org/10.1145/3412370","url":null,"abstract":"Coalition formation has been a fundamental form of resource cooperation for achieving joint goals in multiagent systems. Most existing studies still focus on the traditional assumption that an agent has to contribute its resources to all the goals, even if the agent is not interested in the goal at all. In this article, a natural extension of the traditional coalitional resource games (CRGs) is studied from both theoretical and empirical perspectives, in which each agent has uncompromising, personalized preferences over goals. Specifically, a new CRGs model with agents’ strict preferences for goals is presented, in which an agent is willing to contribute its resources only to the goals that are in its own interest set. The computational complexity of the basic decision problems surrounding the successful coalition is reinvestigated. The results suggest that these problems in such a strict preference way are complex and intractable. To find the largest successful coalition for possible computation reduction or potential parallel processing, a flow-network–based exhaust algorithm, called FNetEA, is proposed to achieve the optimal solution. Then, to solve the problem more efficiently, a hybrid algorithm, named 2D-HA, is developed to find the approximately optimal solution on the basis of genetic algorithm, two-dimensional (2D) solution representation, and a heuristic for solution repairs. Through extensive experiments, the 2D-HA algorithm exhibits the prominent ability to provide reassurances that the optimal solution could be found within a reasonable period of time, even in a super-large-scale space.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132891859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Feedback as Action Assignment in Interactive Reinforcement Learning","authors":"S. Raza, Mary-Anne Williams","doi":"10.1145/3404197","DOIUrl":"https://doi.org/10.1145/3404197","url":null,"abstract":"Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner’s performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users’ ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent’s behaviour directly.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124870560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"UAVs vs. Pirates","authors":"Ruiwen Zhang, T. Holvoet, Bifeng Song, Y. Pei","doi":"10.1145/3380782","DOIUrl":"https://doi.org/10.1145/3380782","url":null,"abstract":"For the rising hazard of pirate attacks, unmanned aerial vehicle (UAV) swarm monitoring is a promising countermeasure. Previous monitoring methods have deficiencies in either adaptivity to dynamic events or simple but effective path coordination mechanisms, and they are inapplicable to the large-area, low-target-density, and long-duration persistent counter-piracy monitoring. This article proposes a self-organized UAV swarm counter-piracy monitoring method. Based on the pheromone map, this method is characterized by (1) a reservation mechanism for anticipatory path coordination and (2) a ship-adaptive mechanism for adapting to merchant ship distributions. A heuristic depth-first branch and bound search algorithm is designed for solving individual path planning. Simulation experiments are conducted to study the optimal number of plan steps and adaptivity scaling factor for different numbers of UAVs. Results show that merely decreasing revisit intervals cannot effectively reduce pirate attacks. Without the ship-adaptive mechanism, the proposed method reduces up to 87.2%, 43.2%, and 5.5% of revisit intervals compared to the Lèvy Walk method, the sweep method, and the baseline self-organized method, respectively, but cannot reduce pirate attacks; while with the ship-adaptive mechanism, the proposed method can reduce pirate attacks by up to 6.7% compared to the best of the baseline methods.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115969279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Federico Chiariotti, Chiara Pielli, A. Zanella, M. Zorzi
{"title":"A Bike-sharing Optimization Framework Combining Dynamic Rebalancing and User Incentives","authors":"Federico Chiariotti, Chiara Pielli, A. Zanella, M. Zorzi","doi":"10.1145/3376923","DOIUrl":"https://doi.org/10.1145/3376923","url":null,"abstract":"Bike-sharing systems have become an established reality in cities all across the world and are a key component of the Smart City paradigm. However, the unbalanced traffic patterns during rush hours can completely empty some stations, while filling others, and the service becomes unavailable for further users. The traditional approach to solve this problem is to use rebalancing trucks, which take bikes from full stations and deposit them at empty ones, reducing the likelihood of system outages. Another paradigm that is gaining steam is gamification, i.e., incentivizing users to fix the system by influencing their behavior with rewards and prizes. In this work, we combine the two efforts and show that a joint optimization considering both rebalancing and incentives results in a higher service quality for a lower cost than using simple rebalancing. We use simulations based on the New York CitiBike usage data to validate our model and analyze several schemes to optimize the bike-sharing system.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126888932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving Scalability and Reward of Utility-Driven Self-Healing for Large Dynamic Architectures","authors":"Sona Ghahremani, H. Giese, T. Vogel","doi":"10.1145/3380965","DOIUrl":"https://doi.org/10.1145/3380965","url":null,"abstract":"Self-adaptation can be realized in various ways. Rule-based approaches prescribe the adaptation to be executed if the system or environment satisfies certain conditions. They result in scalable solutions but often with merely satisfying adaptation decisions. In contrast, utility-driven approaches determine optimal decisions by using an often costly optimization, which typically does not scale for large problems. We propose a rule-based and utility-driven adaptation scheme that achieves the benefits of both directions such that the adaptation decisions are optimal, whereas the computation scales by avoiding an expensive optimization. We use this adaptation scheme for architecture-based self-healing of large software systems. For this purpose, we define the utility for large dynamic architectures of such systems based on patterns that define issues the self-healing must address. Moreover, we use pattern-based adaptation rules to resolve these issues. Using a pattern-based scheme to define the utility and adaptation rules allows us to compute the impact of each rule application on the overall utility and to realize an incremental and efficient utility-driven self-healing. In addition to formally analyzing the computational effort and optimality of the proposed scheme, we thoroughly demonstrate its scalability and optimality in terms of reward in comparative experiments with a static rule-based approach as a baseline and a utility-driven approach using a constraint solver. These experiments are based on different failure profiles derived from real-world failure logs. We also investigate the impact of different failure profile characteristics on the scalability and reward to evaluate the robustness of the different approaches.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117086088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Argumentation-Based Reasoning about Plans, Maintenance Goals, and Norms","authors":"Z. Shams, M. B. Vos, Nir Oren, J. Padget","doi":"10.1145/3364220","DOIUrl":"https://doi.org/10.1145/3364220","url":null,"abstract":"In a normative environment, an agent’s actions are directed not only by its goals but also by the norms activated by its actions and those of other actors. The potential for conflict between agent goals and norms makes decision making challenging, in that it requires looking ahead to consider the longer-term consequences of which goal to satisfy or which norm to comply with in face of conflict. We therefore seek to determine the actions an agent should select at each point in time, taking account of its temporal goals, norms, and their conflicts. We propose a solution in which a normative planning problem is the basis for practical reasoning based on argumentation. Various types of conflict within goals, within norms, and between goals and norms are identified based on temporal properties of these entities. The properties of the best plan(s) with respect to goal achievement and norm compliance are mapped to arguments, followed by mapping their conflicts to attack between arguments, all of which are used to identify why a plan is justified.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133823544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}