{"title":"信息受限环境下的无自我模型学习与外部奖励学习","authors":"Prachi Pratyusha Sahoo;Kyriakos G. Vamvoudakis","doi":"10.1109/TAI.2024.3433614","DOIUrl":null,"url":null,"abstract":"In this article, we provide a model-free reinforcement learning (RL) framework that relies on internal reinforcement signals, called self-model-free RL, for learning agents that experience loss of the reinforcement signals in the form of packet drops and/or jamming attacks by malicious agents. The framework embeds a correcting mechanism in the form of a goal network to compensate for information loss and produce optimal and stabilizing policies. It also provides a trade-off scheme that reconstructs the reward using a goal network whenever the reinforcement signals are lost but utilizes true reinforcement signals when they are available. The stability of the equilibrium point is guaranteed despite fractional information loss in the reinforcement signals. Finally, simulation results validate the efficacy of the proposed work.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"5 12","pages":"6566-6579"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-Model-Free Learning Versus Learning With External Rewards in Information Constrained Environments\",\"authors\":\"Prachi Pratyusha Sahoo;Kyriakos G. Vamvoudakis\",\"doi\":\"10.1109/TAI.2024.3433614\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, we provide a model-free reinforcement learning (RL) framework that relies on internal reinforcement signals, called self-model-free RL, for learning agents that experience loss of the reinforcement signals in the form of packet drops and/or jamming attacks by malicious agents. The framework embeds a correcting mechanism in the form of a goal network to compensate for information loss and produce optimal and stabilizing policies. It also provides a trade-off scheme that reconstructs the reward using a goal network whenever the reinforcement signals are lost but utilizes true reinforcement signals when they are available. The stability of the equilibrium point is guaranteed despite fractional information loss in the reinforcement signals. Finally, simulation results validate the efficacy of the proposed work.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"5 12\",\"pages\":\"6566-6579\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10684035/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10684035/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Self-Model-Free Learning Versus Learning With External Rewards in Information Constrained Environments
In this article, we provide a model-free reinforcement learning (RL) framework that relies on internal reinforcement signals, called self-model-free RL, for learning agents that experience loss of the reinforcement signals in the form of packet drops and/or jamming attacks by malicious agents. The framework embeds a correcting mechanism in the form of a goal network to compensate for information loss and produce optimal and stabilizing policies. It also provides a trade-off scheme that reconstructs the reward using a goal network whenever the reinforcement signals are lost but utilizes true reinforcement signals when they are available. The stability of the equilibrium point is guaranteed despite fractional information loss in the reinforcement signals. Finally, simulation results validate the efficacy of the proposed work.