{"title":"Multi-trainer binary feedback interactive reinforcement learning","authors":"Zhaori Guo, Timothy J. Norman, Enrico H. Gerding","doi":"10.1007/s10472-024-09956-4","DOIUrl":null,"url":null,"abstract":"<div><p>Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the <i>trainer</i> (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.</p></div>","PeriodicalId":7971,"journal":{"name":"Annals of Mathematics and Artificial Intelligence","volume":"93 4","pages":"491 - 516"},"PeriodicalIF":1.0000,"publicationDate":"2024-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Mathematics and Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10472-024-09956-4","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Interactive reinforcement learning is an effective way to train agents via human feedback. However, it often requires the trainer (a human who provides feedback to the agent) to know the correct action for the agent. If the trainer is not always reliable, the wrong feedback may hinder the agent’s training. In addition, there is no consensus on the best form of human feedback in interactive reinforcement learning. To address these problems, in this paper, we explore the performance of binary reward as the reward form. Moreover, we propose a novel interactive reinforcement learning system called Multi-Trainer Interactive Reinforcement Learning (MTIRL), which can aggregate binary feedback from multiple imperfect trainers into a reliable reward for agent training in a reward-sparse environment. In addition, the review model in MTIRL can correct the unreliable rewards. In particular, our experiments for evaluating reward forms show that binary reward outperforms other reward forms, including ranking reward, scaling reward, and state value reward. In addition, our question-answer experiments show that our aggregation method outperforms the state-of-the-art aggregation methods, including majority voting, weighted voting, and the Bayesian aggregation method. Finally, we conduct grid-world experiments to show that the policy trained by the MTIRL with the review model is closer to the optimal policy than that without a review model.
期刊介绍:
Annals of Mathematics and Artificial Intelligence presents a range of topics of concern to scholars applying quantitative, combinatorial, logical, algebraic and algorithmic methods to diverse areas of Artificial Intelligence, from decision support, automated deduction, and reasoning, to knowledge-based systems, machine learning, computer vision, robotics and planning.
The journal features collections of papers appearing either in volumes (400 pages) or in separate issues (100-300 pages), which focus on one topic and have one or more guest editors.
Annals of Mathematics and Artificial Intelligence hopes to influence the spawning of new areas of applied mathematics and strengthen the scientific underpinnings of Artificial Intelligence.