Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.

Journal of the SFdS Pub Date : 2014-01-01
Gilles Celeux, Marie-Laure Martin-Magniette, Cathy Maugis-Rabusseau, Adrian E Raftery
{"title":"Comparing Model Selection and Regularization Approaches to Variable Selection in Model-Based Clustering.","authors":"Gilles Celeux,&nbsp;Marie-Laure Martin-Magniette,&nbsp;Cathy Maugis-Rabusseau,&nbsp;Adrian E Raftery","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than <i>K</i>-means without variable selection. But the model selection approach is not available in a very high dimension context.</p>","PeriodicalId":44492,"journal":{"name":"Journal of the SFdS","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178956/pdf/nihms-547507.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the SFdS","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We compare two major approaches to variable selection in clustering: model selection and regularization. Based on previous results, we select the method of Maugis et al. (2009b), which modified the method of Raftery and Dean (2006), as a current state of the art model selection method. We select the method of Witten and Tibshirani (2010) as a current state of the art regularization method. We compared the methods by simulation in terms of their accuracy in both classification and variable selection. In the first simulation experiment all the variables were conditionally independent given cluster membership. We found that variable selection (of either kind) yielded substantial gains in classification accuracy when the clusters were well separated, but few gains when the clusters were close together. We found that the two variable selection methods had comparable classification accuracy, but that the model selection approach had substantially better accuracy in selecting variables. In our second simulation experiment, there were correlations among the variables given the cluster memberships. We found that the model selection approach was substantially more accurate in terms of both classification and variable selection than the regularization approach, and that both gave more accurate classifications than K-means without variable selection. But the model selection approach is not available in a very high dimension context.

Abstract Image

Abstract Image

Abstract Image

基于模型聚类中变量选择的模型选择和正则化方法的比较。
我们比较了两种主要的聚类变量选择方法:模型选择和正则化。基于之前的结果,我们选择了Maugis et al. (2009b)的方法,该方法修改了Raftery和Dean(2006)的方法,作为当前最先进的模型选择方法。我们选择Witten和Tibshirani(2010)的方法作为当前最先进的正则化方法。通过仿真比较了两种方法在分类和变量选择上的准确性。在第一个仿真实验中,所有变量都是给定簇隶属度的条件独立变量。我们发现,当聚类很好地分离时,变量选择(任何一种)在分类精度上都有很大的提高,但当聚类靠近时,几乎没有提高。我们发现两种变量选择方法具有相当的分类精度,但模型选择方法在选择变量方面具有更好的准确性。在我们的第二个模拟实验中,给定集群成员的变量之间存在相关性。我们发现,模型选择方法在分类和变量选择方面都比正则化方法准确得多,并且两者都比没有变量选择的K-means给出更准确的分类。但是模型选择方法不适用于非常高维的上下文中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of the SFdS
Journal of the SFdS STATISTICS & PROBABILITY-
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信