并行变间隔强化调度的最优策略

Zhenbo Cheng, Ming Liang, Zhidong Deng
{"title":"并行变间隔强化调度的最优策略","authors":"Zhenbo Cheng, Ming Liang, Zhidong Deng","doi":"10.1109/CCDC.2010.5498938","DOIUrl":null,"url":null,"abstract":"Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.","PeriodicalId":227938,"journal":{"name":"2010 Chinese Control and Decision Conference","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal strategy for concurrent variable interval reinforcement schedule\",\"authors\":\"Zhenbo Cheng, Ming Liang, Zhidong Deng\",\"doi\":\"10.1109/CCDC.2010.5498938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.\",\"PeriodicalId\":227938,\"journal\":{\"name\":\"2010 Chinese Control and Decision Conference\",\"volume\":\"2012 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Chinese Control and Decision Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCDC.2010.5498938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Chinese Control and Decision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCDC.2010.5498938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

Herrnstein通过实验研究了鸽子在一种特殊的强化计划——并发可变间隔(concurrent variable interval, CVI)计划上的选择行为,发现了著名的匹配规律。经验行为法则在许多物种中都是非常保守的,但它一直被视为一种非理性行为,这意味着匹配行为不会使奖励最大化。在本文中,我们简要地证明了任何导致匹配律的策略都可以在离散时间步长的CVI强化调度中获得最大的奖励。此外,我们还提出了一种新的策略算法,可以在CVI强化调度中获得最大的奖励。我们的研究结果表明,在强化计划中,匹配行为可以被视为一种理性行为。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimal strategy for concurrent variable interval reinforcement schedule
Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信