Candy A. Huanca-Anquise, A. Bazzan, Anderson R. Tavares
{"title":"多目标多武装土匪:重复博弈算法及其在路径选择中的应用","authors":"Candy A. Huanca-Anquise, A. Bazzan, Anderson R. Tavares","doi":"10.22456/2175-2745.122929","DOIUrl":null,"url":null,"abstract":"Multi-objective decision-making in multi-agent scenarios poses multiple challenges. Dealing with multiple objectives and non-stationarity caused by simultaneous learning are only two of them, which have been addressed separately. In this work, reinforcement learning algorithms that tackle both issues together are proposed and applied to a route choice problem, where drivers must select an action in a single-state formulation, while aiming to minimize both their travel time and toll. Hence, we deal with repeated games, now with a multi-objective approach. Advantages, limitations and differences of these algorithms are discussed. Our results show that the proposed algorithms for action selection using reinforcement learning deal with non-stationarity and multiple objectives, while providing alternative solutions to those of centralized methods.","PeriodicalId":53421,"journal":{"name":"Revista de Informatica Teorica e Aplicada","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice\",\"authors\":\"Candy A. Huanca-Anquise, A. Bazzan, Anderson R. Tavares\",\"doi\":\"10.22456/2175-2745.122929\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-objective decision-making in multi-agent scenarios poses multiple challenges. Dealing with multiple objectives and non-stationarity caused by simultaneous learning are only two of them, which have been addressed separately. In this work, reinforcement learning algorithms that tackle both issues together are proposed and applied to a route choice problem, where drivers must select an action in a single-state formulation, while aiming to minimize both their travel time and toll. Hence, we deal with repeated games, now with a multi-objective approach. Advantages, limitations and differences of these algorithms are discussed. Our results show that the proposed algorithms for action selection using reinforcement learning deal with non-stationarity and multiple objectives, while providing alternative solutions to those of centralized methods.\",\"PeriodicalId\":53421,\"journal\":{\"name\":\"Revista de Informatica Teorica e Aplicada\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Revista de Informatica Teorica e Aplicada\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22456/2175-2745.122929\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista de Informatica Teorica e Aplicada","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22456/2175-2745.122929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}
Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice
Multi-objective decision-making in multi-agent scenarios poses multiple challenges. Dealing with multiple objectives and non-stationarity caused by simultaneous learning are only two of them, which have been addressed separately. In this work, reinforcement learning algorithms that tackle both issues together are proposed and applied to a route choice problem, where drivers must select an action in a single-state formulation, while aiming to minimize both their travel time and toll. Hence, we deal with repeated games, now with a multi-objective approach. Advantages, limitations and differences of these algorithms are discussed. Our results show that the proposed algorithms for action selection using reinforcement learning deal with non-stationarity and multiple objectives, while providing alternative solutions to those of centralized methods.