{"title":"从不同角度学习,减少强化学习中的遗憾:自由能方法","authors":"Milad Ghorbani, Reshad Hosseini, Seyed Pooya Shariatpanahi, Majid Nili Ahmadabadi","doi":"10.1016/j.neucom.2024.128797","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) is the core method for interactive learning in living and artificial creatures. Nevertheless, in contrast to humans and animals, artificial RL agents are very slow in learning and suffer from the curse of dimensionality. This is partially due to using RL in isolation; i.e. lack of social learning and social diversity. We introduce a free energy-based social RL for learning novel tasks. Society is formed by the learning agent and some diverse virtual ones. That diversity is in their perception while all agents use the same interaction samples for learning and share the same action set. Individual difference in perception is mostly the cause of perceptual aliasing however, it can result in virtual agents’ faster learning in early trials. Our free energy method provides a knowledge integration method for the main agent to benefit from that diversity to reduce its regret. It rests upon Thompson sampling policy and behavioral policy of main and virtual agents. Therefore, it is applicable to a variety of tasks, discrete or continuous state space, model-free, and model-based tasks as well as to different reinforcement learning methods. Through a set of experiments, we show that this general framework highly improves learning speed and is clearly superior to previous existing methods. We also provide convergence proof.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"614 ","pages":"Article 128797"},"PeriodicalIF":5.5000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning from different perspectives for regret reduction in reinforcement learning: A free energy approach\",\"authors\":\"Milad Ghorbani, Reshad Hosseini, Seyed Pooya Shariatpanahi, Majid Nili Ahmadabadi\",\"doi\":\"10.1016/j.neucom.2024.128797\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning (RL) is the core method for interactive learning in living and artificial creatures. Nevertheless, in contrast to humans and animals, artificial RL agents are very slow in learning and suffer from the curse of dimensionality. This is partially due to using RL in isolation; i.e. lack of social learning and social diversity. We introduce a free energy-based social RL for learning novel tasks. Society is formed by the learning agent and some diverse virtual ones. That diversity is in their perception while all agents use the same interaction samples for learning and share the same action set. Individual difference in perception is mostly the cause of perceptual aliasing however, it can result in virtual agents’ faster learning in early trials. Our free energy method provides a knowledge integration method for the main agent to benefit from that diversity to reduce its regret. It rests upon Thompson sampling policy and behavioral policy of main and virtual agents. Therefore, it is applicable to a variety of tasks, discrete or continuous state space, model-free, and model-based tasks as well as to different reinforcement learning methods. Through a set of experiments, we show that this general framework highly improves learning speed and is clearly superior to previous existing methods. We also provide convergence proof.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"614 \",\"pages\":\"Article 128797\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224015686\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015686","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Learning from different perspectives for regret reduction in reinforcement learning: A free energy approach
Reinforcement learning (RL) is the core method for interactive learning in living and artificial creatures. Nevertheless, in contrast to humans and animals, artificial RL agents are very slow in learning and suffer from the curse of dimensionality. This is partially due to using RL in isolation; i.e. lack of social learning and social diversity. We introduce a free energy-based social RL for learning novel tasks. Society is formed by the learning agent and some diverse virtual ones. That diversity is in their perception while all agents use the same interaction samples for learning and share the same action set. Individual difference in perception is mostly the cause of perceptual aliasing however, it can result in virtual agents’ faster learning in early trials. Our free energy method provides a knowledge integration method for the main agent to benefit from that diversity to reduce its regret. It rests upon Thompson sampling policy and behavioral policy of main and virtual agents. Therefore, it is applicable to a variety of tasks, discrete or continuous state space, model-free, and model-based tasks as well as to different reinforcement learning methods. Through a set of experiments, we show that this general framework highly improves learning speed and is clearly superior to previous existing methods. We also provide convergence proof.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.