提高线性回归可解释性的整数约束

IF 0.7 4区 数学 Q4 OPERATIONS RESEARCH & MANAGEMENT SCIENCE
E. Priego, Alba V. Olivares-Nadal, Pepa Ramírez Cobo
{"title":"提高线性回归可解释性的整数约束","authors":"E. Priego, Alba V. Olivares-Nadal, Pepa Ramírez Cobo","doi":"10.2436/20.8080.02.95","DOIUrl":null,"url":null,"abstract":"One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.","PeriodicalId":49497,"journal":{"name":"Sort-Statistics and Operations Research Transactions","volume":"27 1","pages":"69-78"},"PeriodicalIF":0.7000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Integer constraints for enhancing interpretability in linear regression\",\"authors\":\"E. Priego, Alba V. Olivares-Nadal, Pepa Ramírez Cobo\",\"doi\":\"10.2436/20.8080.02.95\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.\",\"PeriodicalId\":49497,\"journal\":{\"name\":\"Sort-Statistics and Operations Research Transactions\",\"volume\":\"27 1\",\"pages\":\"69-78\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sort-Statistics and Operations Research Transactions\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.2436/20.8080.02.95\",\"RegionNum\":4,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"OPERATIONS RESEARCH & MANAGEMENT SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sort-Statistics and Operations Research Transactions","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.2436/20.8080.02.95","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}
引用次数: 3

摘要

研究人员面临的主要挑战之一是确定预测模型中最相关的特征。因此,许多寻求稀疏性的正则化方法蓬勃发展。虽然稀疏,但在存在伪系数和相关特征时,它们的解可能无法解释。在本文中,我们的目标是通过:(i)强迫估计系数的符号与预测因子之间的相关性的符号一致,以及(ii)避免假系数,以便在模型中只表示重要的特征,来增强多重共线性存在的线性回归的可解释性。这将通过建模约束并将它们添加到表达一些估计过程(如普通最小二乘或套索)的优化问题中来解决。得到的约束回归模型将成为混合整数二次问题。在真实和模拟数据集上进行的数值实验表明,通过添加约束模型(i)和/或(ii)来缩小一些标准线性回归模型的搜索空间,有助于提高具有竞争性预测质量的解的稀疏性和可解释性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integer constraints for enhancing interpretability in linear regression
One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Sort-Statistics and Operations Research Transactions
Sort-Statistics and Operations Research Transactions 管理科学-统计学与概率论
CiteScore
3.10
自引率
0.00%
发文量
0
审稿时长
>12 weeks
期刊介绍: SORT (Statistics and Operations Research Transactions) —formerly Qüestiió— is an international journal launched in 2003. It is published twice-yearly, in English, by the Statistical Institute of Catalonia (Idescat). The journal is co-edited by the Universitat Politècnica de Catalunya, Universitat de Barcelona, Universitat Autonòma de Barcelona, Universitat de Girona, Universitat Pompeu Fabra i Universitat de Lleida, with the co-operation of the Spanish Section of the International Biometric Society and the Catalan Statistical Society. SORT promotes the publication of original articles of a methodological or applied nature or motivated by an applied problem in statistics, operations research, official statistics or biometrics as well as book reviews. We encourage authors to include an example of a real data set in their manuscripts.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信