{"title":"多重共线性条件下线性和非要素线性回归模型的解释算法","authors":"M.P. Bazilevski","doi":"10.22213/2410-9304-2023-3-40-47","DOIUrl":null,"url":null,"abstract":"When constructing machine learning models, more and more researchers come to understanding that in addition to the accuracy of the model, its interpretability is also important, that means the degree of understandability to a person. Currently, a new field is being formed in science - interpretable machine learning. This article is devoted to the study of linear regression models interpretation questions. Traditionally, linear regressions are interpreted in terms of weak correlation of explanatory variables. In this case, according to a well-known pattern, it is possible to explain the influence of each input variable on the output variable. Often the explanatory variables are highly correlated with each other. In such a situation, it is recommended to exclude strongly correlated variables, that leads, firstly, to a decrease in the model quality, and secondly, to integrity loss of the study and the interpretation of a process or phenomenon under study. In this paper, we propose an algorithm for interpreting a linear regression constructed for any degree of explanatory variablecorrelation. The algorithm gives the traditional interpretation of linear regression with weak correlation of all explanatory variables, while in case of strong correlation, as it follows from the theorem proved in the article, regression interpretation with several variables is reduced to interpreting an equation with one variable, without losing information about the relationships between pairs of explanatory variables. The algorithm can be used not only for simple linear regressions, but also for non-elementary ones, containing, in addition to explanatory variables, their pairs transformed by means of binary operations min and max. The operation of the algorithm is demonstrated on specific examples.","PeriodicalId":238017,"journal":{"name":"Intellekt. Sist. Proizv.","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretation Algorithm of Linear and Non-elementary Linear Regression Models Under Multicollinearity Conditions\",\"authors\":\"M.P. Bazilevski\",\"doi\":\"10.22213/2410-9304-2023-3-40-47\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When constructing machine learning models, more and more researchers come to understanding that in addition to the accuracy of the model, its interpretability is also important, that means the degree of understandability to a person. Currently, a new field is being formed in science - interpretable machine learning. This article is devoted to the study of linear regression models interpretation questions. Traditionally, linear regressions are interpreted in terms of weak correlation of explanatory variables. In this case, according to a well-known pattern, it is possible to explain the influence of each input variable on the output variable. Often the explanatory variables are highly correlated with each other. In such a situation, it is recommended to exclude strongly correlated variables, that leads, firstly, to a decrease in the model quality, and secondly, to integrity loss of the study and the interpretation of a process or phenomenon under study. In this paper, we propose an algorithm for interpreting a linear regression constructed for any degree of explanatory variablecorrelation. The algorithm gives the traditional interpretation of linear regression with weak correlation of all explanatory variables, while in case of strong correlation, as it follows from the theorem proved in the article, regression interpretation with several variables is reduced to interpreting an equation with one variable, without losing information about the relationships between pairs of explanatory variables. The algorithm can be used not only for simple linear regressions, but also for non-elementary ones, containing, in addition to explanatory variables, their pairs transformed by means of binary operations min and max. The operation of the algorithm is demonstrated on specific examples.\",\"PeriodicalId\":238017,\"journal\":{\"name\":\"Intellekt. Sist. Proizv.\",\"volume\":\"72 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-10-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intellekt. Sist. Proizv.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.22213/2410-9304-2023-3-40-47\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intellekt. Sist. Proizv.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.22213/2410-9304-2023-3-40-47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在构建机器学习模型时,越来越多的研究人员认识到,除了模型的准确性外,模型的可解释性也很重要,即人的可理解程度。目前,科学界正在形成一个新领域--可解释机器学习。本文主要研究线性回归模型的解释问题。传统上,线性回归是根据解释变量的弱相关性来解释的。在这种情况下,根据众所周知的模式,可以解释每个输入变量对输出变量的影响。解释变量之间往往高度相关。在这种情况下,建议排除相关性强的变量,这首先会导致模型质量下降,其次会造成研究的完整性损失,以及对所研究过程或现象的解释。在本文中,我们提出了一种算法,用于解释在任何解释变量相关程度下构建的线性回归。在所有解释变量弱相关的情况下,该算法给出了线性回归的传统解释;而在强相关的情况下,根据文章中证明的定理,对多个变量的回归解释简化为对一个变量方程的解释,而不会丢失解释变量对之间关系的信息。该算法不仅可用于简单的线性回归,也可用于非基本回归,除解释变量外,还包含通过二进制运算 min 和 max 转换的变量对。我们将通过具体实例演示该算法的操作。
Interpretation Algorithm of Linear and Non-elementary Linear Regression Models Under Multicollinearity Conditions
When constructing machine learning models, more and more researchers come to understanding that in addition to the accuracy of the model, its interpretability is also important, that means the degree of understandability to a person. Currently, a new field is being formed in science - interpretable machine learning. This article is devoted to the study of linear regression models interpretation questions. Traditionally, linear regressions are interpreted in terms of weak correlation of explanatory variables. In this case, according to a well-known pattern, it is possible to explain the influence of each input variable on the output variable. Often the explanatory variables are highly correlated with each other. In such a situation, it is recommended to exclude strongly correlated variables, that leads, firstly, to a decrease in the model quality, and secondly, to integrity loss of the study and the interpretation of a process or phenomenon under study. In this paper, we propose an algorithm for interpreting a linear regression constructed for any degree of explanatory variablecorrelation. The algorithm gives the traditional interpretation of linear regression with weak correlation of all explanatory variables, while in case of strong correlation, as it follows from the theorem proved in the article, regression interpretation with several variables is reduced to interpreting an equation with one variable, without losing information about the relationships between pairs of explanatory variables. The algorithm can be used not only for simple linear regressions, but also for non-elementary ones, containing, in addition to explanatory variables, their pairs transformed by means of binary operations min and max. The operation of the algorithm is demonstrated on specific examples.