贝叶斯序列设计与强化学习的比较教程

The American Statistician Pub Date : 2022-05-09 DOI:10.1080/00031305.2022.2129787

M. Tec, Yunshan Duan, P. Müller

{"title":"贝叶斯序列设计与强化学习的比较教程","authors":"M. Tec, Yunshan Duan, P. Müller","doi":"10.1080/00031305.2022.2129787","DOIUrl":null,"url":null,"abstract":"Abstract Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.","PeriodicalId":342642,"journal":{"name":"The American Statistician","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning\",\"authors\":\"M. Tec, Yunshan Duan, P. Müller\",\"doi\":\"10.1080/00031305.2022.2129787\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.\",\"PeriodicalId\":342642,\"journal\":{\"name\":\"The American Statistician\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The American Statistician\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/00031305.2022.2129787\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The American Statistician","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/00031305.2022.2129787","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

摘要强化学习(RL)是序列决策问题中奖励驱动学习的一种计算方法。它通过学习与环境交互的智能体而不是从监督数据中学习来实现最佳行为的发现。我们将强化学习与传统序列设计进行对比和比较，重点研究了基于仿真的贝叶斯序列设计(BSD)。最近，人们对RL技术在医疗保健领域的应用越来越感兴趣。我们将介绍两个相关的应用程序作为激励示例。在这两个应用程序中，决策的顺序性质被限制为顺序停止。讨论的重点不是全面的调查，而是使用标准工具解决这两个相对简单的顺序停止问题。这两个问题都受到适应性临床试验设计的启发。我们使用示例来解释每个框架背后的术语和数学背景，并将一个框架映射到另一个框架。实现和结果说明了RL和BSD之间的许多相似之处。结果激发了对每种方法的潜在优势和局限性的讨论。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparative Tutorial of Bayesian Sequential Design and Reinforcement Learning

Abstract Reinforcement learning (RL) is a computational approach to reward-driven learning in sequential decision problems. It implements the discovery of optimal actions by learning from an agent interacting with an environment rather than from supervised data. We contrast and compare RL with traditional sequential design, focusing on simulation-based Bayesian sequential design (BSD). Recently, there has been an increasing interest in RL techniques for healthcare applications. We introduce two related applications as motivating examples. In both applications, the sequential nature of the decisions is restricted to sequential stopping. Rather than a comprehensive survey, the focus of the discussion is on solutions using standard tools for these two relatively simple sequential stopping problems. Both problems are inspired by adaptive clinical trial design. We use examples to explain the terminology and mathematical background that underlie each framework and map one to the other. The implementations and results illustrate the many similarities between RL and BSD. The results motivate the discussion of the potential strengths and limitations of each approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The American Statistician

自引率

0.00%

发文量