{"title":"并行变间隔强化调度的最优策略","authors":"Zhenbo Cheng, Ming Liang, Zhidong Deng","doi":"10.1109/CCDC.2010.5498938","DOIUrl":null,"url":null,"abstract":"Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.","PeriodicalId":227938,"journal":{"name":"2010 Chinese Control and Decision Conference","volume":"2012 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimal strategy for concurrent variable interval reinforcement schedule\",\"authors\":\"Zhenbo Cheng, Ming Liang, Zhidong Deng\",\"doi\":\"10.1109/CCDC.2010.5498938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.\",\"PeriodicalId\":227938,\"journal\":{\"name\":\"2010 Chinese Control and Decision Conference\",\"volume\":\"2012 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Chinese Control and Decision Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCDC.2010.5498938\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Chinese Control and Decision Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCDC.2010.5498938","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimal strategy for concurrent variable interval reinforcement schedule
Herrnstein experimentally studied the choice behavior of pigeons on a special reinforcement schedule, the concurrent variable interval (CVI) schedule, and found a famous matching law. The empirical behavior law is remarkably conserved across many kinds of species, but it has been viewed as an irrational behavior, which means that the matching behavior does not maximize reward. In this paper, we succinctly demonstrate that any strategies leading to matching law can obtain maximal rewards for the CVI reinforcement schedule in discrete time steps. In addition, we put forward a novel strategy algorithm that can earn the maximal reward in the CVI reinforcement schedule. Our results reveal that the matching behavior can be seen as a rational behavior in the reinforcement schedule.