Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus

IF 2.5 3区 计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS
Gabriele Maroni, Loris Cannelli, Dario Piga
{"title":"Gradient-based bilevel optimization for multi-penalty Ridge regression through matrix differential calculus","authors":"Gabriele Maroni,&nbsp;Loris Cannelli,&nbsp;Dario Piga","doi":"10.1016/j.ejcon.2024.101150","DOIUrl":null,"url":null,"abstract":"<div><div>Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the trade-off between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm’s flexibility and potential for better generalization. In this paper, we address the problem of linear regression with <span><math><msub><mrow><mi>ℓ</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that the proposed multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression in terms of <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> score both in a static regression and in a system identification problem. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables, with an improvement of more than an order of magnitude. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.</div></div>","PeriodicalId":50489,"journal":{"name":"European Journal of Control","volume":"81 ","pages":"Article 101150"},"PeriodicalIF":2.5000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal of Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0947358024002103","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Common regularization algorithms for linear regression, such as LASSO and Ridge regression, rely on a regularization hyperparameter that balances the trade-off between minimizing the fitting error and the norm of the learned model coefficients. As this hyperparameter is scalar, it can be easily selected via random or grid search optimizing a cross-validation criterion. However, using a scalar hyperparameter limits the algorithm’s flexibility and potential for better generalization. In this paper, we address the problem of linear regression with 2-regularization, where a different regularization hyperparameter is associated with each input variable. We optimize these hyperparameters using a gradient-based approach, wherein the gradient of a cross-validation criterion with respect to the regularization hyperparameters is computed analytically through matrix differential calculus. Additionally, we introduce two strategies tailored for sparse model learning problems aiming at reducing the risk of overfitting to the validation data. Numerical examples demonstrate that the proposed multi-hyperparameter regularization approach outperforms LASSO, Ridge, and Elastic Net regression in terms of R2 score both in a static regression and in a system identification problem. Moreover, the analytical computation of the gradient proves to be more efficient in terms of computational time compared to automatic differentiation, especially when handling a large number of input variables, with an improvement of more than an order of magnitude. Application to the identification of over-parameterized Linear Parameter-Varying models is also presented.
通过矩阵微分计算实现基于梯度的多惩罚脊回归双级优化
常见的线性回归正则化算法,如 LASSO 和 Ridge 回归,都依赖于一个正则化超参数,它能在拟合误差最小化与所学模型系数的规范之间取得平衡。由于该超参数是标量参数,因此可以通过随机或网格搜索优化交叉验证准则来轻松选择。然而,使用标量超参数限制了算法的灵活性和更好的泛化潜力。在本文中,我们用 ℓ2- 规则化来解决线性回归问题,其中每个输入变量都有一个不同的规则化超参数。我们使用基于梯度的方法优化这些超参数,其中交叉验证准则相对于正则化超参数的梯度是通过矩阵微分计算分析得出的。此外,我们还介绍了两种为稀疏模型学习问题量身定制的策略,旨在降低对验证数据过度拟合的风险。数值示例表明,在静态回归和系统识别问题中,所提出的多参数正则化方法在 R2 分数方面优于 LASSO、Ridge 和 Elastic Net 回归。此外,与自动微分相比,梯度的分析计算在计算时间上更有效,尤其是在处理大量输入变量时,其效率提高了一个数量级以上。此外,还介绍了超参数化线性参数变化模型识别的应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
European Journal of Control
European Journal of Control 工程技术-自动化与控制系统
CiteScore
5.80
自引率
5.90%
发文量
131
审稿时长
1 months
期刊介绍: The European Control Association (EUCA) has among its objectives to promote the development of the discipline. Apart from the European Control Conferences, the European Journal of Control is the Association''s main channel for the dissemination of important contributions in the field. The aim of the Journal is to publish high quality papers on the theory and practice of control and systems engineering. The scope of the Journal will be wide and cover all aspects of the discipline including methodologies, techniques and applications. Research in control and systems engineering is necessary to develop new concepts and tools which enhance our understanding and improve our ability to design and implement high performance control systems. Submitted papers should stress the practical motivations and relevance of their results. The design and implementation of a successful control system requires the use of a range of techniques: Modelling Robustness Analysis Identification Optimization Control Law Design Numerical analysis Fault Detection, and so on.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信