采用基于交叉的相似性度量提高遗传规划的泛化能力

Proceedings of the 11th Annual conference on Genetic and evolutionary computation Pub Date : 2009-07-08 DOI:10.1145/1569901.1570054

L. Vanneschi, Steven M. Gustafson

{"title":"采用基于交叉的相似性度量提高遗传规划的泛化能力","authors":"L. Vanneschi, Steven M. Gustafson","doi":"10.1145/1569901.1570054","DOIUrl":null,"url":null,"abstract":"Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called ``repulsors'' is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper.","PeriodicalId":193093,"journal":{"name":"Proceedings of the 11th Annual conference on Genetic and evolutionary computation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Using crossover based similarity measure to improve genetic programming generalization ability\",\"authors\":\"L. Vanneschi, Steven M. Gustafson\",\"doi\":\"10.1145/1569901.1570054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called ``repulsors'' is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper.\",\"PeriodicalId\":193093,\"journal\":{\"name\":\"Proceedings of the 11th Annual conference on Genetic and evolutionary computation\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th Annual conference on Genetic and evolutionary computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1569901.1570054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th Annual conference on Genetic and evolutionary computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1569901.1570054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

泛化是机器学习中一个非常重要的问题。本文提出了一种提高遗传规划泛化能力的新思路。这个想法是基于一个动态的双层选择算法，并在一个现实生活中的药物发现回归应用程序中进行了测试。该算法首先使用均方根误差作为适应度和通常的比赛选择。一个名为“排斥者”的个体列表也保存在内存中，并初始化为空。当一个个体被发现与训练集过拟合时，它被插入到排斥器列表中。当排斥力列表不为空时，选择就变成了一个双层算法:参加比赛的个体不是从总体中随机选择的，而是自己被选择的，以与排斥力的平均不相似度作为最大化的标准。为此，测试了两种相似/不相似度量:众所周知的结构(或编辑)距离和最近定义的基于子树交叉的相似性度量。这种思路虽然简单，但似乎提高了遗传规划的泛化能力，实验结果表明，当使用基于子树交叉的相似性度量时，遗传规划的泛化效果更好，至少对于本文研究的测试问题是这样。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Using crossover based similarity measure to improve genetic programming generalization ability

Generalization is a very important issue in Machine Learning. In this paper, we present a new idea for improving Genetic Programming generalization ability. The idea is based on a dynamic two-layered selection algorithm and it is tested on a real-life drug discovery regression application. The algorithm begins using root mean squared error as fitness and the usual tournament selection. A list of individuals called ``repulsors'' is also kept in memory and initialized as empty. As an individual is found to overfit the training set, it is inserted into the list of repulsors. When the list of repulsors is not empty, selection becomes a two-layer algorithm: individuals participating to the tournament are not randomly chosen from the population but are themselves selected, using the average dissimilarity to the repulsors as a criterion to be maximized. Two kinds of similarity/dissimilarity measures are tested for this aim: the well known structural (or edit) distance and the recently defined subtree crossover based similarity measure. Although simple, this idea seems to improve Genetic Programming generalization ability and the presented experimental results show that Genetic Programming generalizes better when subtree crossover based similarity measure is used, at least for the test problems studied in this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th Annual conference on Genetic and evolutionary computation

自引率

0.00%

发文量