Objective model selection with parallel genetic algorithms using an eradication strategy

Pub Date : 2023-06-05 DOI:10.1002/cjs.11775
Jean-François Plante, Maxime Larocque, Michel Adès
{"title":"Objective model selection with parallel genetic algorithms using an eradication strategy","authors":"Jean-François Plante,&nbsp;Maxime Larocque,&nbsp;Michel Adès","doi":"10.1002/cjs.11775","DOIUrl":null,"url":null,"abstract":"<p>In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cjs.11775","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"100","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cjs.11775","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In supervised learning, feature selection methods identify the most relevant predictors to include in a model. For linear models, the inclusion or exclusion of each variable may be represented as a vector of bits playing the role of the genetic material that defines the model. Genetic algorithms reproduce the strategies of natural selection on a population of models to identify the best. We derive the distribution of the importance scores for parallel genetic algorithms under the null hypothesis that none of the features has predictive power. They, hence, provide an objective threshold for feature selection that does not require the visual inspection of a bubble plot. We also introduce the eradication strategy, akin to forward stepwise selection, where the genes of useful variables are sequentially forced into the models. The method is illustrated on real data, and simulation studies are run to describe its performance.

Abstract Image

分享
查看原文
基于根除策略的并行遗传算法目标模型选择
在有监督学习中,特征选择方法可以确定模型中最相关的预测因子。对于线性模型来说,每个变量的加入或排除都可以用比特向量来表示,比特向量就像定义模型的遗传物质。遗传算法再现了对模型群体进行自然选择的策略,以找出最佳模型。在没有任何特征具有预测能力的零假设下,我们得出了并行遗传算法的重要性得分分布。因此,它们为特征选择提供了一个客观的阈值,而无需对气泡图进行目测。我们还引入了类似于前向逐步选择的根除策略,在这种策略中,有用变量的基因会被依次强制加入模型中。我们在真实数据上对该方法进行了说明,并进行了模拟研究以描述其性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信