Integer constraints for enhancing interpretability in linear regression

IF 0.7 4区数学 Q4 OPERATIONS RESEARCH & MANAGEMENT SCIENCE

Sort-Statistics and Operations Research Transactions Pub Date : 2020-04-01 DOI:10.2436/20.8080.02.95

E. Priego, Alba V. Olivares-Nadal, Pepa Ramírez Cobo

{"title":"Integer constraints for enhancing interpretability in linear regression","authors":"E. Priego, Alba V. Olivares-Nadal, Pepa Ramírez Cobo","doi":"10.2436/20.8080.02.95","DOIUrl":null,"url":null,"abstract":"One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.","PeriodicalId":49497,"journal":{"name":"Sort-Statistics and Operations Research Transactions","volume":"27 1","pages":"69-78"},"PeriodicalIF":0.7000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sort-Statistics and Operations Research Transactions","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.2436/20.8080.02.95","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"OPERATIONS RESEARCH & MANAGEMENT SCIENCE","Score":null,"Total":0}

引用次数: 3

Abstract

One of the main challenges researchers face is to identify the most relevant features in a prediction model. As a consequence, many regularized methods seeking sparsity have flourished. Although sparse, their solutions may not be interpretable in the presence of spurious coefficients and correlated features. In this paper we aim to enhance interpretability in linear regression in presence of multicollinearity by: (i) forcing the sign of the estimated coefficients to be consistent with the sign of the correlations between predictors, and (ii) avoiding spurious coefficients so that only significant features are represented in the model. This will be addressed by modelling constraints and adding them to an optimization problem expressing some estimation procedure such as ordinary least squares or the lasso. The so-obtained constrained regression models will become Mixed Integer Quadratic Problems. The numerical experiments carried out on real and simulated datasets show that tightening the search space of some standard linear regression models by adding the constraints modelling (i) and/or (ii) help to improve the sparsity and interpretability of the solutions with competitive predictive quality.

查看原文本刊更多论文

提高线性回归可解释性的整数约束

研究人员面临的主要挑战之一是确定预测模型中最相关的特征。因此，许多寻求稀疏性的正则化方法蓬勃发展。虽然稀疏，但在存在伪系数和相关特征时，它们的解可能无法解释。在本文中，我们的目标是通过:(i)强迫估计系数的符号与预测因子之间的相关性的符号一致，以及(ii)避免假系数，以便在模型中只表示重要的特征，来增强多重共线性存在的线性回归的可解释性。这将通过建模约束并将它们添加到表达一些估计过程(如普通最小二乘或套索)的优化问题中来解决。得到的约束回归模型将成为混合整数二次问题。在真实和模拟数据集上进行的数值实验表明，通过添加约束模型(i)和/或(ii)来缩小一些标准线性回归模型的搜索空间，有助于提高具有竞争性预测质量的解的稀疏性和可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Sort-Statistics and Operations Research Transactions 管理科学-统计学与概率论

CiteScore

3.10

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： SORT (Statistics and Operations Research Transactions) —formerly Qüestiió— is an international journal launched in 2003. It is published twice-yearly, in English, by the Statistical Institute of Catalonia (Idescat). The journal is co-edited by the Universitat Politècnica de Catalunya, Universitat de Barcelona, Universitat Autonòma de Barcelona, Universitat de Girona, Universitat Pompeu Fabra i Universitat de Lleida, with the co-operation of the Spanish Section of the International Biometric Society and the Catalan Statistical Society. SORT promotes the publication of original articles of a methodological or applied nature or motivated by an applied problem in statistics, operations research, official statistics or biometrics as well as book reviews. We encourage authors to include an example of a real data set in their manuscripts.