{"title":"从成功或不成功的经验中学习?","authors":"Keum Joo Kim, Eugene Santos","doi":"10.1177/21695067231192528","DOIUrl":null,"url":null,"abstract":"Humans learn from both successful and unsuccessful experiences, because useful information about how to solve complex problems can be gleaned not only from success but also from failure. In this paper, we propose a method for investigating this difference by applying Preference based Inverse Reinforcement Learning to Double Transition Models built from replays of StarCraft II. Our method provides two advantages: (1) the ability to identify integrated reward distributions from computational models composed of multiple experiences, and (2) the ability to discern differences between learning by successes and failures. Our experimental results demonstrate that reward distributions are shaped depending on the trajectories utilized to build models. Reward distributions based on successful episodes were skewed to the left, while those based on unsuccessful episodes were skewed to the right. Furthermore, we found that players with symmetric triple reward distributions had a high probability of winning the game.","PeriodicalId":74544,"journal":{"name":"Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual meeting","volume":"33 8","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning by Successful or Unsuccessful Experiences?\",\"authors\":\"Keum Joo Kim, Eugene Santos\",\"doi\":\"10.1177/21695067231192528\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Humans learn from both successful and unsuccessful experiences, because useful information about how to solve complex problems can be gleaned not only from success but also from failure. In this paper, we propose a method for investigating this difference by applying Preference based Inverse Reinforcement Learning to Double Transition Models built from replays of StarCraft II. Our method provides two advantages: (1) the ability to identify integrated reward distributions from computational models composed of multiple experiences, and (2) the ability to discern differences between learning by successes and failures. Our experimental results demonstrate that reward distributions are shaped depending on the trajectories utilized to build models. Reward distributions based on successful episodes were skewed to the left, while those based on unsuccessful episodes were skewed to the right. Furthermore, we found that players with symmetric triple reward distributions had a high probability of winning the game.\",\"PeriodicalId\":74544,\"journal\":{\"name\":\"Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual meeting\",\"volume\":\"33 8\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1177/21695067231192528\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Human Factors and Ergonomics Society ... Annual Meeting. Human Factors and Ergonomics Society. Annual meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1177/21695067231192528","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning by Successful or Unsuccessful Experiences?
Humans learn from both successful and unsuccessful experiences, because useful information about how to solve complex problems can be gleaned not only from success but also from failure. In this paper, we propose a method for investigating this difference by applying Preference based Inverse Reinforcement Learning to Double Transition Models built from replays of StarCraft II. Our method provides two advantages: (1) the ability to identify integrated reward distributions from computational models composed of multiple experiences, and (2) the ability to discern differences between learning by successes and failures. Our experimental results demonstrate that reward distributions are shaped depending on the trajectories utilized to build models. Reward distributions based on successful episodes were skewed to the left, while those based on unsuccessful episodes were skewed to the right. Furthermore, we found that players with symmetric triple reward distributions had a high probability of winning the game.