Sparse Group Penalties for bi-level variable selection

IF 1.3 3区 生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY
Gregor Buch, Andreas Schulz, Irene Schmidtmann, Konstantin Strauch, Philipp S. Wild
{"title":"Sparse Group Penalties for bi-level variable selection","authors":"Gregor Buch,&nbsp;Andreas Schulz,&nbsp;Irene Schmidtmann,&nbsp;Konstantin Strauch,&nbsp;Philipp S. Wild","doi":"10.1002/bimj.202200334","DOIUrl":null,"url":null,"abstract":"<p>Many data sets exhibit a natural group structure due to contextual similarities or high correlations of variables, such as lipid markers that are interrelated based on biochemical principles. Knowledge of such groupings can be used through bi-level selection methods to identify relevant feature groups and highlight their predictive members. One of the best known approaches of this kind combines the classical <i>Least Absolute Shrinkage and Selection Operator</i> (LASSO) with the <i>Group LASSO</i>, resulting in the <i>Sparse Group LASSO</i>. We propose the Sparse Group Penalty (SGP) framework, which allows for a flexible combination of different SGL-style shrinkage conditions. Analogous to SGL, we investigated the combination of the <i>Smoothly Clipped Absolute Deviation</i> (SCAD), the <i>Minimax Concave Penalty</i> (MCP) and the <i>Exponential Penalty</i> (EP) with their group versions, resulting in the <i>Sparse Group SCAD</i>, the <i>Sparse Group MCP</i>, and the novel <i>Sparse Group EP</i> (SGE). Those shrinkage operators provide refined control of the effect of group formation on the selection process through a tuning parameter. In simulation studies, SGPs were compared with other bi-level selection methods (Group Bridge, composite MCP, and Group Exponential LASSO) for variable and group selection evaluated with the Matthews correlation coefficient. We demonstrated the advantages of the new SGE in identifying parsimonious models, but also identified scenarios that highlight the limitations of the approach. The performance of the techniques was further investigated in a real-world use case for the selection of regulated lipids in a randomized clinical trial.</p>","PeriodicalId":55360,"journal":{"name":"Biometrical Journal","volume":"66 4","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2024-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.202200334","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrical Journal","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bimj.202200334","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Many data sets exhibit a natural group structure due to contextual similarities or high correlations of variables, such as lipid markers that are interrelated based on biochemical principles. Knowledge of such groupings can be used through bi-level selection methods to identify relevant feature groups and highlight their predictive members. One of the best known approaches of this kind combines the classical Least Absolute Shrinkage and Selection Operator (LASSO) with the Group LASSO, resulting in the Sparse Group LASSO. We propose the Sparse Group Penalty (SGP) framework, which allows for a flexible combination of different SGL-style shrinkage conditions. Analogous to SGL, we investigated the combination of the Smoothly Clipped Absolute Deviation (SCAD), the Minimax Concave Penalty (MCP) and the Exponential Penalty (EP) with their group versions, resulting in the Sparse Group SCAD, the Sparse Group MCP, and the novel Sparse Group EP (SGE). Those shrinkage operators provide refined control of the effect of group formation on the selection process through a tuning parameter. In simulation studies, SGPs were compared with other bi-level selection methods (Group Bridge, composite MCP, and Group Exponential LASSO) for variable and group selection evaluated with the Matthews correlation coefficient. We demonstrated the advantages of the new SGE in identifying parsimonious models, but also identified scenarios that highlight the limitations of the approach. The performance of the techniques was further investigated in a real-world use case for the selection of regulated lipids in a randomized clinical trial.

Abstract Image

用于双级变量选择的稀疏组惩罚。
许多数据集由于上下文相似性或变量的高度相关性(如基于生化原理相互关联的脂质标记)而呈现出一种自然的分组结构。这种分组知识可通过双级选择方法来识别相关特征组并突出其预测成员。这类方法中最著名的一种是将经典的最小绝对收缩和选择算子(LASSO)与组 LASSO 结合起来,形成稀疏组 LASSO。我们提出了稀疏组惩罚(SGP)框架,它允许灵活组合不同的 SGL 式收缩条件。与 SGL 类似,我们研究了平滑截断绝对偏差(SCAD)、最小值凹惩罚(MCP)和指数惩罚(EP)与它们的组版本的组合,最终得出稀疏组 SCAD、稀疏组 MCP 和新型稀疏组 EP (SGE)。这些收缩算子通过一个调整参数对分组形成对选择过程的影响进行了精细控制。在模拟研究中,我们将 SGP 与其他双层选择方法(群桥、复合 MCP 和群指数 LASSO)进行了比较,并用马修斯相关系数对变量和群选择进行了评估。我们证明了新的 SGE 在确定拟合模型方面的优势,但也发现了一些凸显该方法局限性的情况。在随机临床试验中选择受调控血脂的实际应用案例中,我们进一步研究了这些技术的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Biometrical Journal
Biometrical Journal 生物-数学与计算生物学
CiteScore
3.20
自引率
5.90%
发文量
119
审稿时长
6-12 weeks
期刊介绍: Biometrical Journal publishes papers on statistical methods and their applications in life sciences including medicine, environmental sciences and agriculture. Methodological developments should be motivated by an interesting and relevant problem from these areas. Ideally the manuscript should include a description of the problem and a section detailing the application of the new methodology to the problem. Case studies, review articles and letters to the editors are also welcome. Papers containing only extensive mathematical theory are not suitable for publication in Biometrical Journal.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信