Andrew Zhao , Erle Zhu , Rui Lu , Matthieu Lin , Yong-Jin Liu , Gao Huang
{"title":"无监督强化学习的自参考智能体","authors":"Andrew Zhao , Erle Zhu , Rui Lu , Matthieu Lin , Yong-Jin Liu , Gao Huang","doi":"10.1016/j.neunet.2025.107448","DOIUrl":null,"url":null,"abstract":"<div><div>Current unsupervised reinforcement learning methods often overlook reward nonstationarity during pre-training and the forgetting of exploratory behavior during fine-tuning. Our study introduces Self-Reference (SR), a novel add-on module designed to address both issues. SR stabilizes intrinsic rewards through historical referencing in pre-training, mitigating nonstationarity. During fine-tuning, it preserves exploratory behaviors, retaining valuable skills. Our approach significantly boosts the performance and sample efficiency of existing URL model-free methods on the Unsupervised Reinforcement Learning Benchmark, improving IQM by up to 17% and reducing the Optimality Gap by 31%. This highlights the general applicability and compatibility of our add-on module with existing methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"188 ","pages":"Article 107448"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Self-Referencing Agents for Unsupervised Reinforcement Learning\",\"authors\":\"Andrew Zhao , Erle Zhu , Rui Lu , Matthieu Lin , Yong-Jin Liu , Gao Huang\",\"doi\":\"10.1016/j.neunet.2025.107448\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Current unsupervised reinforcement learning methods often overlook reward nonstationarity during pre-training and the forgetting of exploratory behavior during fine-tuning. Our study introduces Self-Reference (SR), a novel add-on module designed to address both issues. SR stabilizes intrinsic rewards through historical referencing in pre-training, mitigating nonstationarity. During fine-tuning, it preserves exploratory behaviors, retaining valuable skills. Our approach significantly boosts the performance and sample efficiency of existing URL model-free methods on the Unsupervised Reinforcement Learning Benchmark, improving IQM by up to 17% and reducing the Optimality Gap by 31%. This highlights the general applicability and compatibility of our add-on module with existing methods.</div></div>\",\"PeriodicalId\":49763,\"journal\":{\"name\":\"Neural Networks\",\"volume\":\"188 \",\"pages\":\"Article 107448\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0893608025003272\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025003272","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Self-Referencing Agents for Unsupervised Reinforcement Learning
Current unsupervised reinforcement learning methods often overlook reward nonstationarity during pre-training and the forgetting of exploratory behavior during fine-tuning. Our study introduces Self-Reference (SR), a novel add-on module designed to address both issues. SR stabilizes intrinsic rewards through historical referencing in pre-training, mitigating nonstationarity. During fine-tuning, it preserves exploratory behaviors, retaining valuable skills. Our approach significantly boosts the performance and sample efficiency of existing URL model-free methods on the Unsupervised Reinforcement Learning Benchmark, improving IQM by up to 17% and reducing the Optimality Gap by 31%. This highlights the general applicability and compatibility of our add-on module with existing methods.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.