{"title":"Model Loss and Distribution Analysis of Regression Problems in Machine Learning","authors":"Nan Yang, Zeyu Zheng, Tianran Wang","doi":"10.1145/3318299.3318367","DOIUrl":null,"url":null,"abstract":"The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318299.3318367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.