{"title":"相位平行网络策略:基于行动相关性的深度强化学习框架","authors":"Jiahao Li, Tianhan Gao, Qingwei Mi","doi":"10.1007/s00607-024-01329-3","DOIUrl":null,"url":null,"abstract":"<p>Reinforcement learning algorithms show significant variations in performance across different environments. Optimization for reinforcement learning thus becomes the major research task since the instability and unpredictability of the reinforcement learning algorithms have consistently hindered their generalization capabilities. In this study, we address this issue by optimizing the algorithm itself rather than environment-specific optimizations. We start by tackling the uncertainty caused by the mutual influence of original action interferences, aiming to enhance the overall performance. The <i>Phasic Parallel-Network Policy</i> (PPP), which is a deep reinforcement learning framework. It diverges from the traditional policy actor-critic method by grouping the action space based on action correlations. The PPP incorporates parallel network structures and combines network optimization strategies. With the assistance of the value network, the training process is divided into different specific stages, namely the Extra-group Policy Phase and the Inter-group Optimization Phase. PPP breaks through the traditional unit learning structure. The experimental results indicate that it not only optimizes training effectiveness but also reduces training steps, enhances sample efficiency, and significantly improves stability and generalization.</p>","PeriodicalId":10718,"journal":{"name":"Computing","volume":"34 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Phasic parallel-network policy: a deep reinforcement learning framework based on action correlation\",\"authors\":\"Jiahao Li, Tianhan Gao, Qingwei Mi\",\"doi\":\"10.1007/s00607-024-01329-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Reinforcement learning algorithms show significant variations in performance across different environments. Optimization for reinforcement learning thus becomes the major research task since the instability and unpredictability of the reinforcement learning algorithms have consistently hindered their generalization capabilities. In this study, we address this issue by optimizing the algorithm itself rather than environment-specific optimizations. We start by tackling the uncertainty caused by the mutual influence of original action interferences, aiming to enhance the overall performance. The <i>Phasic Parallel-Network Policy</i> (PPP), which is a deep reinforcement learning framework. It diverges from the traditional policy actor-critic method by grouping the action space based on action correlations. The PPP incorporates parallel network structures and combines network optimization strategies. With the assistance of the value network, the training process is divided into different specific stages, namely the Extra-group Policy Phase and the Inter-group Optimization Phase. PPP breaks through the traditional unit learning structure. The experimental results indicate that it not only optimizes training effectiveness but also reduces training steps, enhances sample efficiency, and significantly improves stability and generalization.</p>\",\"PeriodicalId\":10718,\"journal\":{\"name\":\"Computing\",\"volume\":\"34 1\",\"pages\":\"\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s00607-024-01329-3\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00607-024-01329-3","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Phasic parallel-network policy: a deep reinforcement learning framework based on action correlation
Reinforcement learning algorithms show significant variations in performance across different environments. Optimization for reinforcement learning thus becomes the major research task since the instability and unpredictability of the reinforcement learning algorithms have consistently hindered their generalization capabilities. In this study, we address this issue by optimizing the algorithm itself rather than environment-specific optimizations. We start by tackling the uncertainty caused by the mutual influence of original action interferences, aiming to enhance the overall performance. The Phasic Parallel-Network Policy (PPP), which is a deep reinforcement learning framework. It diverges from the traditional policy actor-critic method by grouping the action space based on action correlations. The PPP incorporates parallel network structures and combines network optimization strategies. With the assistance of the value network, the training process is divided into different specific stages, namely the Extra-group Policy Phase and the Inter-group Optimization Phase. PPP breaks through the traditional unit learning structure. The experimental results indicate that it not only optimizes training effectiveness but also reduces training steps, enhances sample efficiency, and significantly improves stability and generalization.
期刊介绍:
Computing publishes original papers, short communications and surveys on all fields of computing. The contributions should be written in English and may be of theoretical or applied nature, the essential criteria are computational relevance and systematic foundation of results.