与三层环境交互的一种新的学习自动机

2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing Pub Date : 2012-09-24 DOI:10.1109/ICCI-CC.2012.6311198

A. Jamalian, R. Rezvani, Hossein Shams, Shamim Mehrabi

{"title":"与三层环境交互的一种新的学习自动机","authors":"A. Jamalian, R. Rezvani, Hossein Shams, Shamim Mehrabi","doi":"10.1109/ICCI-CC.2012.6311198","DOIUrl":null,"url":null,"abstract":"Heretofore, the most presented Learning Automata (LA) is invented to interact with double level environments (one level for reward and the other for penalty). Those LA are often expedient, optimal or both of them and can minimize their mean value of receiving penalties (or at least converge to the minimum point) during time an d work much better than a pure-chance automaton. However, in many operational applications, the environment has three level responses; one for reward, one for small scale penalty and the last one for large scale penalty. In these applications, the old LA not only can not minimize the mean value of receiving penalties, but also in some cases their mean value of receiving penalties are even more than a pure-chance automaton. In this paper, first the triple level environments with illustrative example are described precisely. Then, the new fixed structure stochastic LA (called TILA) is introduced and its properties are considered mathematically. The simulation results show the old LA can not converge to the minimum value of the mean value of receiving penalties, but TILA receives fewer penalties in comparison with the older ones.","PeriodicalId":427778,"journal":{"name":"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A new learning automaton for interaction with triple level environments\",\"authors\":\"A. Jamalian, R. Rezvani, Hossein Shams, Shamim Mehrabi\",\"doi\":\"10.1109/ICCI-CC.2012.6311198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heretofore, the most presented Learning Automata (LA) is invented to interact with double level environments (one level for reward and the other for penalty). Those LA are often expedient, optimal or both of them and can minimize their mean value of receiving penalties (or at least converge to the minimum point) during time an d work much better than a pure-chance automaton. However, in many operational applications, the environment has three level responses; one for reward, one for small scale penalty and the last one for large scale penalty. In these applications, the old LA not only can not minimize the mean value of receiving penalties, but also in some cases their mean value of receiving penalties are even more than a pure-chance automaton. In this paper, first the triple level environments with illustrative example are described precisely. Then, the new fixed structure stochastic LA (called TILA) is introduced and its properties are considered mathematically. The simulation results show the old LA can not converge to the minimum value of the mean value of receiving penalties, but TILA receives fewer penalties in comparison with the older ones.\",\"PeriodicalId\":427778,\"journal\":{\"name\":\"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCI-CC.2012.6311198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2012.6311198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

到目前为止，最常见的学习自动机(LA)是为了与双层环境(一层用于奖励，另一层用于惩罚)进行交互而发明的。这些LA通常是权宜之计，或者是最优的，或者两者兼而有之，并且可以在一段时间内最小化它们接受惩罚的平均值(或者至少收敛到最小点)，并且比纯机会自动机工作得更好。然而，在许多操作应用程序中，环境有三个级别的响应;一个用于奖励，一个用于小规模惩罚，最后一个用于大规模惩罚。在这些应用中，旧的LA不仅不能最小化接收处罚的平均值，而且在某些情况下，它们接收处罚的平均值甚至超过了纯机会自动机。本文首先用实例对三层环境进行了详细的描述。然后，引入了一种新的固定结构随机LA (TILA)，并对其性质进行了数学分析。仿真结果表明，旧LA不能收敛到接收惩罚均值的最小值，而TILA接收到的惩罚比旧LA少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new learning automaton for interaction with triple level environments

Heretofore, the most presented Learning Automata (LA) is invented to interact with double level environments (one level for reward and the other for penalty). Those LA are often expedient, optimal or both of them and can minimize their mean value of receiving penalties (or at least converge to the minimum point) during time an d work much better than a pure-chance automaton. However, in many operational applications, the environment has three level responses; one for reward, one for small scale penalty and the last one for large scale penalty. In these applications, the old LA not only can not minimize the mean value of receiving penalties, but also in some cases their mean value of receiving penalties are even more than a pure-chance automaton. In this paper, first the triple level environments with illustrative example are described precisely. Then, the new fixed structure stochastic LA (called TILA) is introduced and its properties are considered mathematically. The simulation results show the old LA can not converge to the minimum value of the mean value of receiving penalties, but TILA receives fewer penalties in comparison with the older ones.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing

自引率

0.00%

发文量