基于根除策略的并行遗传算法目标模型选择

IF 1 4区数学 Q3 STATISTICS & PROBABILITY

Canadian Journal of Statistics-Revue Canadienne De Statistique Pub Date : 2023-06-05 DOI:10.1002/cjs.11775

Jean-François Plante, Maxime Larocque, Michel Adès

{"title":"基于根除策略的并行遗传算法目标模型选择","authors":"Jean-François Plante, Maxime Larocque, Michel Adès","doi":"10.1002/cjs.11775","DOIUrl":null,"url":null,"abstract":"<p>In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.</p>","PeriodicalId":55281,"journal":{"name":"Canadian Journal of Statistics-Revue Canadienne De Statistique","volume":"52 2","pages":"636-654"},"PeriodicalIF":1.0000,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11775","citationCount":"0","resultStr":"{\"title\":\"Objective model selection with parallel genetic algorithms using an eradication strategy\",\"authors\":\"Jean-François Plante, Maxime Larocque, Michel Adès\",\"doi\":\"10.1002/cjs.11775\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.</p>\",\"PeriodicalId\":55281,\"journal\":{\"name\":\"Canadian Journal of Statistics-Revue Canadienne De Statistique\",\"volume\":\"52 2\",\"pages\":\"636-654\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2023-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11775\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Canadian Journal of Statistics-Revue Canadienne De Statistique\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cjs.11775\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Canadian Journal of Statistics-Revue Canadienne De Statistique","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cjs.11775","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

摘要

在有监督学习中，特征选择方法可以确定模型中最相关的预测因子。对于线性模型来说，每个变量的加入或排除都可以用比特向量来表示，比特向量就像定义模型的遗传物质。遗传算法再现了对模型群体进行自然选择的策略，以找出最佳模型。在没有任何特征具有预测能力的零假设下，我们得出了并行遗传算法的重要性得分分布。因此，它们为特征选择提供了一个客观的阈值，而无需对气泡图进行目测。我们还引入了类似于前向逐步选择的根除策略，在这种策略中，有用变量的基因会被依次强制加入模型中。我们在真实数据上对该方法进行了说明，并进行了模拟研究以描述其性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Objective model selection with parallel genetic algorithms using an eradication strategy

查看原文本刊更多论文

Objective model selection with parallel genetic algorithms using an eradication strategy

In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Canadian Journal of Statistics-Revue Canadienne De Statistique 数学-统计学与概率论

CiteScore

1.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： The Canadian Journal of Statistics is the official journal of the Statistical Society of Canada. It has a reputation internationally as an excellent journal. The editorial board is comprised of statistical scientists with applied, computational, methodological, theoretical and probabilistic interests. Their role is to ensure that the journal continues to provide an international forum for the discipline of Statistics. The journal seeks papers making broad points of interest to many readers, whereas papers making important points of more specific interest are better placed in more specialized journals. The levels of innovation and impact are key in the evaluation of submitted manuscripts.