{"title":"TAMER:通过评估强化手动训练Agent","authors":"W. B. Knox, P. Stone","doi":"10.1109/DEVLRN.2008.4640845","DOIUrl":null,"url":null,"abstract":"Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agentpsilas observed actions. Specifically, in sequential decision making tasks, an agent models the humanpsilas reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainerspsila feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.","PeriodicalId":366099,"journal":{"name":"2008 7th IEEE International Conference on Development and Learning","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"136","resultStr":"{\"title\":\"TAMER: Training an Agent Manually via Evaluative Reinforcement\",\"authors\":\"W. B. Knox, P. Stone\",\"doi\":\"10.1109/DEVLRN.2008.4640845\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agentpsilas observed actions. Specifically, in sequential decision making tasks, an agent models the humanpsilas reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainerspsila feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.\",\"PeriodicalId\":366099,\"journal\":{\"name\":\"2008 7th IEEE International Conference on Development and Learning\",\"volume\":\"151 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"136\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 7th IEEE International Conference on Development and Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DEVLRN.2008.4640845\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 7th IEEE International Conference on Development and Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DEVLRN.2008.4640845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TAMER: Training an Agent Manually via Evaluative Reinforcement
Though computers have surpassed humans at many tasks, especially computationally intensive ones, there are many tasks for which human expertise remains necessary and/or useful. For such tasks, it is desirable for a human to be able to transmit knowledge to a learning agent as quickly and effortlessly as possible, and, ideally, without any knowledge of the details of the agentpsilas learning process. This paper proposes a general framework called Training an Agent Manually via Evaluative Reinforcement (TAMER) that allows a human to train a learning agent to perform a common class of complex tasks simply by giving scalar reward signals in response to the agentpsilas observed actions. Specifically, in sequential decision making tasks, an agent models the humanpsilas reward function and chooses actions that it predicts will receive the most reward. Our novel algorithm is fully implemented and tested on the game Tetris. Leveraging the human trainerspsila feedback, the agent learns to clear an average of more than 50 lines by its third game, an order of magnitude faster than the best autonomous learning agents.