并行变间隔强化调度的最优策略

2010 Chinese Control and Decision Conference Pub Date : 2010-05-26 DOI:10.1109/CCDC.2010.5498938

Zhenbo Cheng, Ming Liang, Zhidong Deng

{"title":"并行变间隔强化调度的最优策略","authors":"Zhenbo Cheng, Ming Liang, Zhidong Deng","doi":"10.1109/CCDC.2010.5498938","DOIUrl":null,"url":null,"abstract":"Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.","PeriodicalId":227938,"journal":{"name":"2010 Chinese Control and Decision Conference","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal strategy for concurrent variable interval reinforcement schedule\",\"authors\":\"Zhenbo Cheng, Ming Liang, Zhidong Deng\",\"doi\":\"10.1109/CCDC.2010.5498938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.\",\"PeriodicalId\":227938,\"journal\":{\"name\":\"2010 Chinese Control and Decision Conference\",\"volume\":\"2012 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Chinese Control and Decision Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCDC.2010.5498938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Chinese Control and Decision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCDC.2010.5498938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

Herrnstein通过实验研究了鸽子在一种特殊的强化计划——并发可变间隔(concurrent variable interval, CVI)计划上的选择行为，发现了著名的匹配规律。经验行为法则在许多物种中都是非常保守的，但它一直被视为一种非理性行为，这意味着匹配行为不会使奖励最大化。在本文中，我们简要地证明了任何导致匹配律的策略都可以在离散时间步长的CVI强化调度中获得最大的奖励。此外，我们还提出了一种新的策略算法，可以在CVI强化调度中获得最大的奖励。我们的研究结果表明，在强化计划中，匹配行为可以被视为一种理性行为。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal strategy for concurrent variable interval reinforcement schedule

Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 Chinese Control and Decision Conference

自引率

0.00%

发文量