知情下采样词库选择：为高效解决问题识别富有成效的训练案例。

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Evolutionary Computation Pub Date : 2024-12-02 DOI:10.1162/evco_a_00346

Ryan Boldi;Martin Briesch;Dominik Sobania;Alexander Lalejini;Thomas Helmuth;Franz Rothlauf;Charles Ofria;Lee Spector

{"title":"知情下采样词库选择：为高效解决问题识别富有成效的训练案例。","authors":"Ryan Boldi;Martin Briesch;Dominik Sobania;Alexander Lalejini;Thomas Helmuth;Franz Rothlauf;Charles Ofria;Lee Spector","doi":"10.1162/evco_a_00346","DOIUrl":null,"url":null,"abstract":"Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases, allowing for more individuals to be explored with the same number of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while still benefiting from reduced per-evaluation costs.","PeriodicalId":50470,"journal":{"name":"Evolutionary Computation","volume":"32 4","pages":"307-337"},"PeriodicalIF":3.4000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Informed Down-Sampled Lexicase Selection: Identifying Productive Training Cases for Efficient Problem Solving\",\"authors\":\"Ryan Boldi;Martin Briesch;Dominik Sobania;Alexander Lalejini;Thomas Helmuth;Franz Rothlauf;Charles Ofria;Lee Spector\",\"doi\":\"10.1162/evco_a_00346\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases, allowing for more individuals to be explored with the same number of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while still benefiting from reduced per-evaluation costs.\",\"PeriodicalId\":50470,\"journal\":{\"name\":\"Evolutionary Computation\",\"volume\":\"32 4\",\"pages\":\"307-337\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Evolutionary Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10902657/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10902657/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

遗传编程（GP）通常使用大型训练集，并要求在选择过程中对所有训练案例中的所有个体进行评估。随机向下抽样的词法选择只在训练案例的随机子集上对个体进行评估，这样就能在执行相同数量程序的情况下探索出更多个体。然而，随机抽样可能会在若干代内将重要的案例排除在向下抽样之外，而测量相同行为的案例（同义案例）可能会被过度使用。在这项工作中，我们引入了 "知情向下抽样词库选择"（Informed Down-Sampled Lexicase Selection）。这种方法利用群体统计来建立向下样本，这些样本包含更多不同的训练案例，因此信息量更大。通过对两个不同的 GP 系统（PushGP 和语法引导 GP）进行实证调查，我们发现在一组当代程序合成基准问题上，有信息的向下采样明显优于随机向下采样。通过对所创建的下采样进行分析，我们发现重要的训练案例在不同的进化运行和系统中都会被一致地纳入下采样中。我们假设，这种改进可归因于知情下采样词库选择（Informed Down-Sampled Lexicase Selection）在进化过程中保持更多专业个体的能力，同时还能从降低每次评估成本中获益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Informed Down-Sampled Lexicase Selection: Identifying Productive Training Cases for Efficient Problem Solving

Genetic Programming (GP) often uses large training sets and requires all individuals to be evaluated on all training cases during selection. Random down-sampled lexicase selection evaluates individuals on only a random subset of the training cases, allowing for more individuals to be explored with the same number of program executions. However, sampling randomly can exclude important cases from the down-sample for a number of generations, while cases that measure the same behavior (synonymous cases) may be overused. In this work, we introduce Informed Down-Sampled Lexicase Selection. This method leverages population statistics to build down-samples that contain more distinct and therefore informative training cases. Through an empirical investigation across two different GP systems (PushGP and Grammar-Guided GP), we find that informed down-sampling significantly outperforms random down-sampling on a set of contemporary program synthesis benchmark problems. Through an analysis of the created down-samples, we find that important training cases are included in the down-sample consistently across independent evolutionary runs and systems. We hypothesize that this improvement can be attributed to the ability of Informed Down-Sampled Lexicase Selection to maintain more specialist individuals over the course of evolution, while still benefiting from reduced per-evaluation costs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Evolutionary Computation 工程技术-计算机：理论方法

CiteScore

6.40

自引率

1.50%

发文量

审稿时长

3 months

期刊介绍： Evolutionary Computation is a leading journal in its field. It provides an international forum for facilitating and enhancing the exchange of information among researchers involved in both the theoretical and practical aspects of computational systems drawing their inspiration from nature, with particular emphasis on evolutionary models of computation such as genetic algorithms, evolutionary strategies, classifier systems, evolutionary programming, and genetic programming. It welcomes articles from related fields such as swarm intelligence (e.g. Ant Colony Optimization and Particle Swarm Optimization), and other nature-inspired computation paradigms (e.g. Artificial Immune Systems). As well as publishing articles describing theoretical and/or experimental work, the journal also welcomes application-focused papers describing breakthrough results in an application domain or methodological papers where the specificities of the real-world problem led to significant algorithmic improvements that could possibly be generalized to other areas.