与三层环境交互的一种新的学习自动机

A. Jamalian, R. Rezvani, Hossein Shams, Shamim Mehrabi
{"title":"与三层环境交互的一种新的学习自动机","authors":"A. Jamalian, R. Rezvani, Hossein Shams, Shamim Mehrabi","doi":"10.1109/ICCI-CC.2012.6311198","DOIUrl":null,"url":null,"abstract":"Heretofore, the most presented Learning Automata (LA) is invented to interact with double level environments (one level for reward and the other for penalty). Those LA are often expedient, optimal or both of them and can minimize their mean value of receiving penalties (or at least converge to the minimum point) during time an d work much better than a pure-chance automaton. However, in many operational applications, the environment has three level responses; one for reward, one for small scale penalty and the last one for large scale penalty. In these applications, the old LA not only can not minimize the mean value of receiving penalties, but also in some cases their mean value of receiving penalties are even more than a pure-chance automaton. In this paper, first the triple level environments with illustrative example are described precisely. Then, the new fixed structure stochastic LA (called TILA) is introduced and its properties are considered mathematically. The simulation results show the old LA can not converge to the minimum value of the mean value of receiving penalties, but TILA receives fewer penalties in comparison with the older ones.","PeriodicalId":427778,"journal":{"name":"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"A new learning automaton for interaction with triple level environments\",\"authors\":\"A. Jamalian, R. Rezvani, Hossein Shams, Shamim Mehrabi\",\"doi\":\"10.1109/ICCI-CC.2012.6311198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heretofore, the most presented Learning Automata (LA) is invented to interact with double level environments (one level for reward and the other for penalty). Those LA are often expedient, optimal or both of them and can minimize their mean value of receiving penalties (or at least converge to the minimum point) during time an d work much better than a pure-chance automaton. However, in many operational applications, the environment has three level responses; one for reward, one for small scale penalty and the last one for large scale penalty. In these applications, the old LA not only can not minimize the mean value of receiving penalties, but also in some cases their mean value of receiving penalties are even more than a pure-chance automaton. In this paper, first the triple level environments with illustrative example are described precisely. Then, the new fixed structure stochastic LA (called TILA) is introduced and its properties are considered mathematically. The simulation results show the old LA can not converge to the minimum value of the mean value of receiving penalties, but TILA receives fewer penalties in comparison with the older ones.\",\"PeriodicalId\":427778,\"journal\":{\"name\":\"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing\",\"volume\":\"90 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-09-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCI-CC.2012.6311198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 11th International Conference on Cognitive Informatics and Cognitive Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI-CC.2012.6311198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

到目前为止,最常见的学习自动机(LA)是为了与双层环境(一层用于奖励,另一层用于惩罚)进行交互而发明的。这些LA通常是权宜之计,或者是最优的,或者两者兼而有之,并且可以在一段时间内最小化它们接受惩罚的平均值(或者至少收敛到最小点),并且比纯机会自动机工作得更好。然而,在许多操作应用程序中,环境有三个级别的响应;一个用于奖励,一个用于小规模惩罚,最后一个用于大规模惩罚。在这些应用中,旧的LA不仅不能最小化接收处罚的平均值,而且在某些情况下,它们接收处罚的平均值甚至超过了纯机会自动机。本文首先用实例对三层环境进行了详细的描述。然后,引入了一种新的固定结构随机LA (TILA),并对其性质进行了数学分析。仿真结果表明,旧LA不能收敛到接收惩罚均值的最小值,而TILA接收到的惩罚比旧LA少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A new learning automaton for interaction with triple level environments
Heretofore, the most presented Learning Automata (LA) is invented to interact with double level environments (one level for reward and the other for penalty). Those LA are often expedient, optimal or both of them and can minimize their mean value of receiving penalties (or at least converge to the minimum point) during time an d work much better than a pure-chance automaton. However, in many operational applications, the environment has three level responses; one for reward, one for small scale penalty and the last one for large scale penalty. In these applications, the old LA not only can not minimize the mean value of receiving penalties, but also in some cases their mean value of receiving penalties are even more than a pure-chance automaton. In this paper, first the triple level environments with illustrative example are described precisely. Then, the new fixed structure stochastic LA (called TILA) is introduced and its properties are considered mathematically. The simulation results show the old LA can not converge to the minimum value of the mean value of receiving penalties, but TILA receives fewer penalties in comparison with the older ones.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信