多选项下基底神经节的决策阈值学习。

IF 2.1 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Computation Pub Date : 2025-06-17 DOI:10.1162/neco_a_01760

Thom Griffith;Sophie-Anne Baker;Nathan F. Lepora

{"title":"多选项下基底神经节的决策阈值学习。","authors":"Thom Griffith;Sophie-Anne Baker;Nathan F. Lepora","doi":"10.1162/neco_a_01760","DOIUrl":null,"url":null,"abstract":"In recent years, researchers have integrated the historically separate, reinforcement learning (RL), and evidence-accumulation-to-bound approaches to decision modeling. A particular outcome of these efforts has been the RL-DDM, a model that combines value learning through reinforcement with a diffusion decision model (DDM). While the RL-DDM is a conceptually elegant extension of the original DDM, it faces a similar problem to the DDM in that it does not scale well to decisions with more than two options. Furthermore, in its current form, the RL-DDM lacks flexibility when it comes to adapting to rapid, context-cued changes in the reward environment. The question of how to best extend combined RL and DDM models so they can handle multiple choices remains open. Moreover, it is currently unclear how these algorithmic solutions should map to neurophysical processes in the brain, particularly in relation to so-called go/no-go-type models of decision making in the basal ganglia. Here, we propose a solution that addresses these issues by combining a previously proposed decision model based on the multichoice sequential probability ratio test (MSPRT), with a dual-pathway model of decision threshold learning in the basal ganglia region of the brain. Our model learns decision thresholds to optimize the trade-off between time cost and the cost of errors and so efficiently allocates the amount of time for decision deliberation. In addition, the model is context dependent and hence flexible to changes to the speed-accuracy trade-off (SAT) in the environment. Furthermore, the model reproduces the magnitude effect, a phenomenon seen experimentally in value-based decisions and is agnostic to the types of evidence and so can be used on perceptual decisions, value-based decisions, and other types of modeled evidence. The broader significance of the model is that it contributes to the active research area of how learning systems interact by linking the previously separate models of RL-DDM to dopaminergic models of motivation and risk taking in the basal ganglia, as well as scaling to multiple alternatives.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"37 7","pages":"1256-1287"},"PeriodicalIF":2.1000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decision Threshold Learning in the Basal Ganglia for Multiple Alternatives\",\"authors\":\"Thom Griffith;Sophie-Anne Baker;Nathan F. Lepora\",\"doi\":\"10.1162/neco_a_01760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, researchers have integrated the historically separate, reinforcement learning (RL), and evidence-accumulation-to-bound approaches to decision modeling. A particular outcome of these efforts has been the RL-DDM, a model that combines value learning through reinforcement with a diffusion decision model (DDM). While the RL-DDM is a conceptually elegant extension of the original DDM, it faces a similar problem to the DDM in that it does not scale well to decisions with more than two options. Furthermore, in its current form, the RL-DDM lacks flexibility when it comes to adapting to rapid, context-cued changes in the reward environment. The question of how to best extend combined RL and DDM models so they can handle multiple choices remains open. Moreover, it is currently unclear how these algorithmic solutions should map to neurophysical processes in the brain, particularly in relation to so-called go/no-go-type models of decision making in the basal ganglia. Here, we propose a solution that addresses these issues by combining a previously proposed decision model based on the multichoice sequential probability ratio test (MSPRT), with a dual-pathway model of decision threshold learning in the basal ganglia region of the brain. Our model learns decision thresholds to optimize the trade-off between time cost and the cost of errors and so efficiently allocates the amount of time for decision deliberation. In addition, the model is context dependent and hence flexible to changes to the speed-accuracy trade-off (SAT) in the environment. Furthermore, the model reproduces the magnitude effect, a phenomenon seen experimentally in value-based decisions and is agnostic to the types of evidence and so can be used on perceptual decisions, value-based decisions, and other types of modeled evidence. The broader significance of the model is that it contributes to the active research area of how learning systems interact by linking the previously separate models of RL-DDM to dopaminergic models of motivation and risk taking in the basal ganglia, as well as scaling to multiple alternatives.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"37 7\",\"pages\":\"1256-1287\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2025-06-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11048846/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11048846/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

近年来，研究人员将历史上分离的强化学习（RL）和证据积累到边界的方法集成到决策建模中。这些努力的一个特别成果是RL-DDM，一个将价值学习通过强化与扩散决策模型（DDM）相结合的模型。虽然RL-DDM在概念上是原始DDM的优雅扩展，但它面临着与DDM类似的问题，即它不能很好地扩展到具有两个以上选项的决策。此外，在目前的形式下，RL-DDM在适应奖励环境的快速、情境变化方面缺乏灵活性。如何最好地扩展RL和DDM组合模型，使它们能够处理多种选择的问题仍然没有解决。此外，目前尚不清楚这些算法解决方案如何映射到大脑中的神经物理过程，特别是与基底神经节中所谓的go/no-go型决策模型有关。在这里，我们提出了一种解决方案，通过将先前提出的基于多选择顺序概率比检验（MSPRT）的决策模型与大脑基底神经节区域的决策阈值学习双通路模型相结合来解决这些问题。我们的模型通过学习决策阈值来优化时间成本和错误成本之间的权衡，从而有效地分配决策审议的时间。此外，该模型依赖于上下文，因此可以灵活地适应环境中速度-精度权衡（SAT）的变化。此外，该模型再现了量级效应，这是一种在基于价值的决策中实验观察到的现象，与证据类型无关，因此可以用于感知决策、基于价值的决策和其他类型的建模证据。该模型更广泛的意义在于，它通过将之前分离的RL-DDM模型与基底神经节中动机和风险承担的多巴胺能模型联系起来，以及扩展到多个替代模型，为学习系统如何相互作用的活跃研究领域做出了贡献。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Decision Threshold Learning in the Basal Ganglia for Multiple Alternatives

In recent years, researchers have integrated the historically separate, reinforcement learning (RL), and evidence-accumulation-to-bound approaches to decision modeling. A particular outcome of these efforts has been the RL-DDM, a model that combines value learning through reinforcement with a diffusion decision model (DDM). While the RL-DDM is a conceptually elegant extension of the original DDM, it faces a similar problem to the DDM in that it does not scale well to decisions with more than two options. Furthermore, in its current form, the RL-DDM lacks flexibility when it comes to adapting to rapid, context-cued changes in the reward environment. The question of how to best extend combined RL and DDM models so they can handle multiple choices remains open. Moreover, it is currently unclear how these algorithmic solutions should map to neurophysical processes in the brain, particularly in relation to so-called go/no-go-type models of decision making in the basal ganglia. Here, we propose a solution that addresses these issues by combining a previously proposed decision model based on the multichoice sequential probability ratio test (MSPRT), with a dual-pathway model of decision threshold learning in the basal ganglia region of the brain. Our model learns decision thresholds to optimize the trade-off between time cost and the cost of errors and so efficiently allocates the amount of time for decision deliberation. In addition, the model is context dependent and hence flexible to changes to the speed-accuracy trade-off (SAT) in the environment. Furthermore, the model reproduces the magnitude effect, a phenomenon seen experimentally in value-based decisions and is agnostic to the types of evidence and so can be used on perceptual decisions, value-based decisions, and other types of modeled evidence. The broader significance of the model is that it contributes to the active research area of how learning systems interact by linking the previously separate models of RL-DDM to dopaminergic models of motivation and risk taking in the basal ganglia, as well as scaling to multiple alternatives.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Computation 工程技术-计算机：人工智能

CiteScore

6.30

自引率

3.40%

发文量

审稿时长

3.0 months

期刊介绍： Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.