将基于人类错误的学习与现代深度 RL 算法联系起来。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2024-12-12 DOI:10.1162/neco_a_01721

Michele Garibbo;Casimir J. H. Ludwig;Nathan F. Lepora;Laurence Aitchison

{"title":"将基于人类错误的学习与现代深度 RL 算法联系起来。","authors":"Michele Garibbo;Casimir J. H. Ludwig;Nathan F. Lepora;Laurence Aitchison","doi":"10.1162/neco_a_01721","DOIUrl":null,"url":null,"abstract":"In human error–based learning, the size and direction of a scalar error (i.e., the “directed error”) are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 1","pages":"128-159"},"PeriodicalIF":2.1000,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Relating Human Error–Based Learning to Modern Deep RL Algorithms\",\"authors\":\"Michele Garibbo;Casimir J. H. Ludwig;Nathan F. Lepora;Laurence Aitchison\",\"doi\":\"10.1162/neco_a_01721\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In human error–based learning, the size and direction of a scalar error (i.e., the “directed error”) are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"37 1\",\"pages\":\"128-159\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10810338/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10810338/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在基于人类错误的学习中，标量错误（即 "定向错误"）的大小和方向被用来更新未来的行动。现代的深度强化学习（RL）方法也执行类似的操作，但都是以标量奖励为单位。尽管存在这种相似性，但深度强化学习的行动更新与基于人类错误的学习之间的关系仍有待研究。在这里，我们系统地比较了深度 RL 算法的三个主要系列与基于人类错误的学习之间的关系。通过镜像反转扰动实验的评估，我们发现所有三种深度 RL 方法都与基于人类错误的学习有质的区别。为了弥补这一差距，我们开发了另一种受基于人类错误学习启发的深度 RL 算法，即基于模型的确定性策略梯度（MB-DPG）。我们的研究表明，在镜像反转和旋转扰动下，MB-DPG 能捕捉到基于人类错误的学习，而且在复杂的基于手臂的伸手任务上，MB-DPG 比典型的无模型算法学习速度更快，同时比基于模型的 RL 对（前向）模型错误规范的鲁棒性更强。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Relating Human Error–Based Learning to Modern Deep RL Algorithms

In human error–based learning, the size and direction of a scalar error (i.e., the “directed error”) are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error–based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error–based learning. We show that all three deep RL approaches are qualitatively different from human error–based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error–based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error–based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.