{"title":"利用遗传规划改进离线强化学习的数据收集","authors":"David Halder, Georgios Douzas, Fernando Bacao","doi":"10.1016/j.swevo.2025.102140","DOIUrl":null,"url":null,"abstract":"<div><div>Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"99 ","pages":"Article 102140"},"PeriodicalIF":8.5000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using genetic programming to improve data collection for offline reinforcement learning\",\"authors\":\"David Halder, Georgios Douzas, Fernando Bacao\",\"doi\":\"10.1016/j.swevo.2025.102140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.</div></div>\",\"PeriodicalId\":48682,\"journal\":{\"name\":\"Swarm and Evolutionary Computation\",\"volume\":\"99 \",\"pages\":\"Article 102140\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swarm and Evolutionary Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2210650225002974\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002974","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Using genetic programming to improve data collection for offline reinforcement learning
Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.