{"title":"Evolving code with a large language model","authors":"Erik Hemberg, Stephen Moskal, Una-May O’Reilly","doi":"10.1007/s10710-024-09494-2","DOIUrl":"https://doi.org/10.1007/s10710-024-09494-2","url":null,"abstract":"<p>Algorithms that use Large Language Models (LLMs) to evolve code arrived on the Genetic Programming (GP) scene very recently. We present LLM_GP, a general LLM-based evolutionary algorithm designed to evolve code. Like GP, it uses evolutionary operators, but its designs and implementations of those operators significantly differ from GP’s because they enlist an LLM, using prompting and the LLM’s pre-trained pattern matching and sequence completion capability. We also present a demonstration-level variant of LLM_GP and share its code. By presentations that range from formal to hands-on, we cover design and LLM-usage considerations as well as the scientific challenges that arise when using an LLM for genetic programming.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"5 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142207902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hga-lstm: LSTM architecture and hyperparameter search by hybrid GA for air pollution prediction","authors":"Jiayu Liang, Yaxin Lu, Mingming Su","doi":"10.1007/s10710-024-09493-3","DOIUrl":"https://doi.org/10.1007/s10710-024-09493-3","url":null,"abstract":"<p>Air pollution prediction is a process of predicting the levels of air pollutants in a specific area over a given period. Since LSTM (Long Short-Term Memory) networks are particularly effective in capturing long-term dependencies and patterns in sequential data, they are widely-used for air pollution prediction. However, designing appropriate LSTM architectures and hyperparameters for given tasks can be challenging, which are normally determined by users in existing LSTM-based methods. Note that Genetic Algorithm (GA) is an effective optimization technique, and local search in augmenting the global search ability of GA has been proved, which is rarely considered by existing GA-optimzied LSTM methods. In this work, simultaneous LSTM architecture and hyperparameter search based on GA and local search techniques is investigated for air pollution prediction. Specifically, a new LSTM model search method is designed, termed as HGA-LSTM. HGA is a hybrid GA, which is proposed by integrating GA with local search adaptively. Based on HGA, HGA-LSTM is developed to search for LSTM models with simultaneous LSTM architecture and hyperparameter optimization. In HGA-LSTM, a new crossover is designed to be adaptive to the variable-length representation of LSTM models. The proposed HGA-LSTM is compared with widely-used LSTM-based and nonLSTM-based prediction methods on UCI (University of California Irvine) datasets for air pollution prediction. Results show that HGA-LSTM is generally better than both types of reference methods with its evolved LSTM models achieving lower mean square/absolute errors. Moreover, compared with a baseline method (a GA without local search), HGA-LSTM converges to lower error values, which reflects that HGA has better search ability than GA.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"23 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A survey on dynamic populations in bio-inspired algorithms","authors":"Davide Farinati, Leonardo Vanneschi","doi":"10.1007/s10710-024-09492-4","DOIUrl":"https://doi.org/10.1007/s10710-024-09492-4","url":null,"abstract":"<p>Population-Based Bio-Inspired Algorithms (PBBIAs) are computational methods that simulate natural biological processes, such as evolution or social behaviors, to solve optimization problems. Traditionally, PBBIAs use a population of static size, set beforehand through a specific parameter. Nevertheless, for several decades now, the idea of employing populations of dynamic size, capable of adjusting during the course of a single run, has gained ground. Various methods have been introduced, ranging from simpler ones that use a predefined function to determine the population size variation, to more sophisticated methods where the population size in different phases of the evolutionary process depends on the dynamics of the evolution itself and events occurring within the population during the run. The common underlying idea in many of these approaches, is similar: to save a significant amount of computational effort in phases where the evolution is functioning well, and therefore a large population is not needed. This allows for reusing the previously saved computational effort when optimization becomes more challenging, and hence a greater computational effort is required. Numerous past contributions have demonstrated a notable advantage of using dynamically sized populations, often resulting in comparable results to those obtained by the standard PBBIAs but with a significant saving of computational effort. However, despite the numerous successes that have been presented, to date, there is still no comprehensive collection of past contributions on the use of dynamic populations that allows for their categorization and critical analysis. This article aims to bridge this gap by presenting a systematic literature review regarding the use of dynamic populations in PBBIAs, as well as identifying gaps in the research that can lead the path to future works.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"68 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yazmin Maldonado, Ruben Salas, Joel A. Quevedo, Rogelio Valdez, Leonardo Trujillo
{"title":"GSGP-hardware: instantaneous symbolic regression with an FPGA implementation of geometric semantic genetic programming","authors":"Yazmin Maldonado, Ruben Salas, Joel A. Quevedo, Rogelio Valdez, Leonardo Trujillo","doi":"10.1007/s10710-024-09491-5","DOIUrl":"https://doi.org/10.1007/s10710-024-09491-5","url":null,"abstract":"<p>Geometric Semantic Genetic Programming (GSGP) proposed an important enhancement to GP-based learning, incorporating search operators that operate directly on the semantics of the parents with bounded effects on the semantics of the offspring. This approach posed any symbolic regression fitness landscape as a unimodal function, allowing for more directed search. Moreover, it became evident that the search could be implemented in a much more efficient manner, that does not require the execution, evaluation or manipulation of variable length syntactic models. Hence, efficient implementations of this algorithm have been developed using both CPU and GPU processing. However, current implementations are still ill-suited for real-time learning, or learning on devices with limited resources, scenarios that are becoming more prevalent with the continued development of the Internet-of-Things and the increased need for efficient and distributed learning on the Edge. This paper presents GSGP-Hardware, a fully pipelined and parallel design of GSGP developed fully using VHDL, for implementation on FPGA devices. Using Vivado AMD-Xilinx for synthesis and simulation, GSGP-Hardware achieves an approximate improvement in efficiency, in terms of run time and Gpops/s, of three and four orders of magnitude, respectively, compared with the state-of-the-art GPU implementation. This is a performance increase that has not been achieved by other FPGA-based implementations of genetic programming. This is possible due to the manner in which GSGP evolves a model, and competitive accuracy is achieved by incorporating simple but powerful enhancements to the original GSGP algorithm. GSGP-Hardware allows for instantaneous symbolic regression, opening up new application domains for this powerful variant of genetic programming.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"5 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141501115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giorgia Nadizar, Berfin Sakallioglu, Fraser Garrow, Sara Silva, Leonardo Vanneschi
{"title":"Geometric semantic GP with linear scaling: Darwinian versus Lamarckian evolution","authors":"Giorgia Nadizar, Berfin Sakallioglu, Fraser Garrow, Sara Silva, Leonardo Vanneschi","doi":"10.1007/s10710-024-09488-0","DOIUrl":"https://doi.org/10.1007/s10710-024-09488-0","url":null,"abstract":"<p>Geometric Semantic Genetic Programming (GSGP) has shown notable success in symbolic regression with the introduction of Linear Scaling (LS). This achievement stems from the synergy of the geometric semantic genetic operators of GSGP with the scaling of the individuals for computing their fitness, which favours programs with a promising behaviour. However, the initial combination of GSGP and LS (GSGP-LS) underutilised the potential of LS, scaling individuals only for fitness evaluation, neglecting to incorporate improvements into their genetic material. In this paper we propose an advancement, GSGP with Lamarckian LS (GSGP-LLS), wherein we update the individuals in the population with their scaling coefficients in a Lamarckian fashion, i.e., by inheritance of acquired traits. We assess GSGP-LS and GSGP-LLS against standard GSGP for the task of symbolic regression on five hand-tailored benchmarks and six real-life problems. On the former ones, GSGP-LS and GSGP-LLS both consistently improve GSGP, though with no clear global superiority between them. On the real-world problems, instead, GSGP-LLS steadily outperforms GSGP-LS, achieving faster convergence and superior final performance. Notably, even in cases where LS induces overfitting on challenging problems, GSGP-LLS surpasses GSGP-LS, due to its slower and more localised optimisation steps.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"9 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141195445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical non-dominated sort: analysis and improvement","authors":"Ved Prakash, Sumit Mishra","doi":"10.1007/s10710-024-09487-1","DOIUrl":"https://doi.org/10.1007/s10710-024-09487-1","url":null,"abstract":"<p>Pareto dominance-based multiobjective evolutionary algorithms use non-dominated sorting to rank their solutions. In the last few decades, various approaches have been proposed for non-dominated sorting. However, the running time analysis of some of the approaches has some issues and they are imprecise. In this paper, we focus on one such algorithm namely hierarchical non-dominated sort (HNDS), where the running time is imprecise and obtain the generic equations that show the number of dominance comparisons in the worst and the best case. Based on the equation for the worst case, we obtain the worst-case running time as well as the scenario where the worst case occurs. Based on the equation for the best case, we identify a scenario where HNDS performs less number of dominance comparisons than that presented in the original paper, making the best-case analysis of the original paper unrigorous. In the end, we present an improved version of HNDS which guarantees the claimed worst-case time complexity by the authors of HNDS which is <span>({mathcal {O}}(MN^2))</span>.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"4 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140574831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A genetic programming approach to the automated design of CNN models for image classification and video shorts creation","authors":"","doi":"10.1007/s10710-024-09483-5","DOIUrl":"https://doi.org/10.1007/s10710-024-09483-5","url":null,"abstract":"<h3>Abstract</h3> <p>Neural architecture search (NAS) is a rapidly growing field which focuses on the automated design of neural network architectures. Genetic algorithms (GAs) have been predominantly used for evolving neural network architectures. Genetic programming (GP), a variation of GAs that work in the program space rather than a solution space, has not been as well researched for NAS. This paper aims to contribute to the research into GP for NAS. Previous research in this field can be divided into two categories. In the first each program represents neural networks directly or components and parameters of neural networks. In the second category each program is a set of instructions, which when executed, produces a neural network. This study focuses on this second category which has not been well researched. Previous work has used grammatical evolution for generating these programs. This study examines canonical GP for neural network design (GPNND) for this purpose. It also evaluates a variation of GP, iterative structure-based GP (ISBGP) for evolving these programs. The study compares the performance of GAs, GPNND and ISBGP for image classification and video shorts creation. Both GPNND and ISBGP were found to outperform GAs, with ISBGP producing better results than GPNND for both applications. Both GPNND and ISBGP produced better results than previous studies employing grammatical evolution on the CIFAR-10 dataset.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"64 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140155652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ensemble learning interpretation of geometric semantic genetic programming","authors":"Grant Dick","doi":"10.1007/s10710-024-09482-6","DOIUrl":"https://doi.org/10.1007/s10710-024-09482-6","url":null,"abstract":"<p>Geometric semantic genetic programming (GSGP) is a variant of genetic programming (GP) that directly searches the semantic space of programs to produce candidate solutions. GSGP has shown considerable success in improving the performance of GP in terms of program correctness, however this comes at the expense of exponential program growth. Subsequent attempts to address this growth have not fully-exploited the fact that GSGP searches by producing linear combinations of existing solutions. This paper examines this property of GSGP and frames the method as an ensemble learning approach by redefining mutation and crossover as examples of boosting and stacking, respectively. The ensemble interpretation allows for simple integration of regularisation techniques that significantly reduce the size of the resultant programs. Additionally, this paper examines the quality of parse tree base learners within this ensemble learning interpretation of GSGP and suggests that future research could substantially improve the quality of GSGP by examining more effective initialisation techniques. The resulting ensemble learning interpretation leads to variants of GSGP that substantially improve upon the performance of traditional GSGP in regression contexts, and produce a method that frequently outperforms gradient boosting.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"1 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140099841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Bonin, Luigi Rovito, Andrea De Lorenzo, Luca Manzoni
{"title":"Cellular geometric semantic genetic programming","authors":"Lorenzo Bonin, Luigi Rovito, Andrea De Lorenzo, Luca Manzoni","doi":"10.1007/s10710-024-09480-8","DOIUrl":"https://doi.org/10.1007/s10710-024-09480-8","url":null,"abstract":"<p>Among the different variants of Genetic Programming (GP), Geometric Semantic GP (GSGP) has proved to be both efficient and effective in finding good solutions. The fact that the operators of GSGP operate on the <i>semantics</i> of the individuals in a clear way provides guarantees on the way the search is performed. GSGP is not, however, free from limitations like the premature convergence of the population to a small–and possibly sub-optimal–area of the search space. One reason for this issue could be the fact that good individuals can quickly “spread” in the population suppressing the emergence of competition. To mitigate this problem, we impose a cellular automata (CA) inspired communication topology over GSGP. In CAs a collection of agents (as finite state automata) are positioned in a <i>n</i>-dimensional periodic grid and communicates only locally with the automata in their neighbourhoods. Similarly, we assign a location to each individual on an <i>n</i>-dimensional grid and the entire evolution for an individual will happen locally by considering, for each individual, only the individuals in its neighbourhood. Specifically, we present an algorithm in which, for each generation, a subset of the neighbourhood of each individual is sampled and the selection for the given cell in the grid is performed by extracting the two best individuals of this subset, which are employed as parents for the Geometric Semantic Crossover. We compare this <i>cellular GSGP</i> (cGSGP) approach with standard GSGP on eight regression problems, showing that it can provide better solutions than GSGP. Moreover, by analyzing convergence rates, we show that the improvement is observable regardless of the number of executed generations. As a side effect, we additionally show that combining a small-neighbourhood-based cellular spatial structure with GSGP helps in producing smaller solutions. Finally, we measure the spatial autocorrelation of the population by adopting the Moran’s I coefficient to provide an overview of the diversity, showing that our cellular spatial structure helps in providing better diversity during the early stages of the evolution.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"402 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139926735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural network crossover in genetic algorithms using genetic programming","authors":"Kyle Pretorius, Nelishia Pillay","doi":"10.1007/s10710-024-09481-7","DOIUrl":"https://doi.org/10.1007/s10710-024-09481-7","url":null,"abstract":"<p>The use of genetic algorithms (GAs) to evolve neural network (NN) weights has risen in popularity in recent years, particularly when used together with gradient descent as a mutation operator. However, crossover operators are often omitted from such GAs as they are seen as being highly destructive and detrimental to the performance of the GA. Designing crossover operators that can effectively be applied to NNs has been an active area of research with success limited to specific problem domains. The focus of this study is to use genetic programming (GP) to automatically evolve crossover operators that can be applied to NN weights and used in GAs. A novel GP is proposed and used to evolve both reusable and disposable crossover operators to compare their efficiency. Experiments are conducted to compare the performance of GAs using no crossover operator or a commonly used human designed crossover operator to GAs using GP evolved crossover operators. Results from experiments conducted show that using GP to evolve disposable crossover operators leads to highly effectively crossover operators that significantly improve the results obtained from the GA.</p>","PeriodicalId":50424,"journal":{"name":"Genetic Programming and Evolvable Machines","volume":"17 1","pages":""},"PeriodicalIF":2.6,"publicationDate":"2024-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139926733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}