{"title":"Directly Attention loss adjusted prioritized experience replay","authors":"Zhuoying Chen, Huiping Li, Zhaoxu Wang","doi":"10.1007/s40747-025-01852-6","DOIUrl":null,"url":null,"abstract":"<p>Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"6 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01852-6","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Prioritized Experience Replay enables the model to learn more about relatively important samples by artificially changing their accessed frequencies. However, this non-uniform sampling method shifts the state-action distribution that is originally used to estimate Q-value functions, which brings about the estimation deviation. In this article, a novel off-policy reinforcement learning training framework called Directly Attention Loss Adjusted Prioritized Experience Replay (DALAP) is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network, enabling precise error compensation. Furthermore, a Priority-Encouragement mechanism is designed to optimize the sample screening criteria, and enhance training efficiency. To verify the effectiveness of DALAP, a realistic environment of multi-USV, based on Unreal Engine, is constructed. Comparative experiments across multiple groups demonstrate that DALAP offers significant advantages, including faster convergence and smaller training variance.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.