基于模型的强化学习:综述

IF 65.3 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Foundations and Trends in Machine Learning Pub Date : 2020-06-30 DOI:10.1561/9781638280576

T. Moerland, J. Broekens, C. Jonker

{"title":"基于模型的强化学习:综述","authors":"T. Moerland, J. Broekens, C. Jonker","doi":"10.1561/9781638280576","DOIUrl":null,"url":null,"abstract":"Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a key challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two key sections, we also discuss the potential benefits of model-based RL, like enhanced data efficiency, targeted exploration, and improved stability. Along the survey, we also draw connections to several related RL fields, like hierarchical RL and transfer, and other research disciplines, like behavioural psychology. Altogether, the survey presents a broad conceptual overview of planning-learning combinations for MDP optimization.","PeriodicalId":47667,"journal":{"name":"Foundations and Trends in Machine Learning","volume":"34 1","pages":"1-118"},"PeriodicalIF":65.3000,"publicationDate":"2020-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Model-based Reinforcement Learning: A Survey\",\"authors\":\"T. Moerland, J. Broekens, C. Jonker\",\"doi\":\"10.1561/9781638280576\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a key challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two key sections, we also discuss the potential benefits of model-based RL, like enhanced data efficiency, targeted exploration, and improved stability. Along the survey, we also draw connections to several related RL fields, like hierarchical RL and transfer, and other research disciplines, like behavioural psychology. Altogether, the survey presents a broad conceptual overview of planning-learning combinations for MDP optimization.\",\"PeriodicalId\":47667,\"journal\":{\"name\":\"Foundations and Trends in Machine Learning\",\"volume\":\"34 1\",\"pages\":\"1-118\"},\"PeriodicalIF\":65.3000,\"publicationDate\":\"2020-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Foundations and Trends in Machine Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1561/9781638280576\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1561/9781638280576","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 12

摘要

顺序决策，通常形式化为马尔可夫决策过程(MDP)优化，是人工智能中的一个关键挑战。解决这个问题的两个关键方法是强化学习(RL)和规划。本文概述了这两个领域的整合，即基于模型的强化学习。基于模型的强化学习有两个主要步骤。首先，我们系统地介绍了动态模型学习的方法，包括处理随机性、不确定性、部分可观察性和时间抽象等挑战。其次，我们对计划-学习整合进行了系统的分类，包括:从哪里开始计划，为计划和实际数据收集分配什么预算，如何计划，以及如何将计划整合到学习和行动循环中。在这两个关键部分之后，我们还讨论了基于模型的强化学习的潜在好处，比如增强的数据效率、有针对性的探索和改进的稳定性。在调查过程中，我们还与几个相关的强化学习领域(如等级强化学习和迁移)以及其他研究学科(如行为心理学)建立了联系。总的来说，该调查提供了用于MDP优化的计划-学习组合的广泛概念概述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model-based Reinforcement Learning: A Survey

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a key challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two key sections, we also discuss the potential benefits of model-based RL, like enhanced data efficiency, targeted exploration, and improved stability. Along the survey, we also draw connections to several related RL fields, like hierarchical RL and transfer, and other research disciplines, like behavioural psychology. Altogether, the survey presents a broad conceptual overview of planning-learning combinations for MDP optimization.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Foundations and Trends in Machine Learning COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

108.50

自引率

0.00%

发文量

期刊介绍： Each issue of Foundations and Trends® in Machine Learning comprises a monograph of at least 50 pages written by research leaders in the field. We aim to publish monographs that provide an in-depth, self-contained treatment of topics where there have been significant new developments. Typically, this means that the monographs we publish will contain a significant level of mathematical detail (to describe the central methods and/or theory for the topic at hand), and will not eschew these details by simply pointing to existing references. Literature surveys and original research papers do not fall within these aims.