利用遗传规划改进离线强化学习的数据收集

IF 8.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Swarm and Evolutionary Computation Pub Date : 2025-09-12 DOI:10.1016/j.swevo.2025.102140

David Halder, Georgios Douzas, Fernando Bacao

{"title":"利用遗传规划改进离线强化学习的数据收集","authors":"David Halder, Georgios Douzas, Fernando Bacao","doi":"10.1016/j.swevo.2025.102140","DOIUrl":null,"url":null,"abstract":"<div><div>Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"99 ","pages":"Article 102140"},"PeriodicalIF":8.5000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using genetic programming to improve data collection for offline reinforcement learning\",\"authors\":\"David Halder, Georgios Douzas, Fernando Bacao\",\"doi\":\"10.1016/j.swevo.2025.102140\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.</div></div>\",\"PeriodicalId\":48682,\"journal\":{\"name\":\"Swarm and Evolutionary Computation\",\"volume\":\"99 \",\"pages\":\"Article 102140\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swarm and Evolutionary Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2210650225002974\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210650225002974","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

离线强化学习（RL）仅从固定的预先收集的数据集学习策略，使其适用于数据收集昂贵或有风险的用例。因此，这些离线学习器的性能高度依赖于所使用的数据集。然而，如何收集这些数据以及需要哪些数据集特征的问题并没有得到彻底的研究。与此同时，进化方法作为经典强化学习的一种有希望的替代方法重新出现，导致了进化强化学习（EvoRL）领域的出现，将两种学习范式结合起来，利用它们的互补属性。本研究旨在结合这些研究方向，研究遗传规划（GP）对强化学习数据集特征的影响及其提高离线强化学习算法性能的潜力。采用了比较方法，比较了Deep Q-Networks （DQN）和GP在多种环境和收集模式下的数据收集。对这些方法的探索和开发能力进行了量化，并进行了比较分析，以确定通过GP收集的数据是否会在多个离线学习者中带来更好的表现。研究结果表明，GP在竞争性勘探中产生高质量体验方面表现出强大而稳定的性能。与DQN相比，GP在经验生成方面表现出更低的不确定性，并在所有环境中产生高轨迹质量的数据集。与dqn收集的轨迹相比，更多的离线算法在使用gp收集的数据时显示出统计上显著的性能提升。此外，它们的性能对环境的依赖性较小，因为GP始终如一地生成高质量的数据集。本研究展示了GP属性与离线学习器的有效结合，为优化RL数据收集的未来研究提供了一条有前途的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using genetic programming to improve data collection for offline reinforcement learning

Offline Reinforcement Learning (RL) learns policies solely from fixed pre-collected datasets, making it applicable to use-cases where data collection is expensive or risky. Consequently, the performance of these offline learners is highly dependent on the dataset used. Still the questions of how this data is collected and what dataset characteristics are needed are not thoroughly investigated. Simultaneously, evolutionary methods have reemerged as a promising alternative to classic RL, leading to the field of evolutionary RL (EvoRL), combining the two learning paradigms to exploit their supplementary attributes. This study aims to join these research directions and examine the effects of Genetic Programming (GP) on dataset characteristics in RL and its potential to enhance the performance of offline RL algorithms. A comparative approach was employed, comparing Deep Q-Networks (DQN) and GP for data collection across multiple environments and collection modes. The exploration and exploitation capabilities of these methods were quantified and a comparative analysis was conducted to determine whether data collected through GP led to superior performance in multiple offline learners. The findings indicate that GP demonstrates strong and stable performance in generating high-quality experiences with competitive exploration. GP exhibited lower uncertainty in experience generation compared to DQN and produced high trajectory quality datasets across all environments. More offline algorithms showed statistically significant performance gains with GP-collected data than trained on DQN-collected trajectories. Furthermore, their performance was less dependent on the environment, as the GP consistently generated high-quality datasets. This study showcases the effective combination of GP's properties with offline learners, suggesting a promising avenue for future research in optimizing data collection for RL.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Swarm and Evolutionary Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, THEORY & METHODS

CiteScore

16.00

自引率

12.00%

发文量

169

期刊介绍： Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.