机器学习中回归问题的模型损失与分布分析

International Conference on Machine Learning and Computing Pub Date : 2019-02-22 DOI:10.1145/3318299.3318367

Nan Yang, Zeyu Zheng, Tianran Wang

{"title":"机器学习中回归问题的模型损失与分布分析","authors":"Nan Yang, Zeyu Zheng, Tianran Wang","doi":"10.1145/3318299.3318367","DOIUrl":null,"url":null,"abstract":"The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Model Loss and Distribution Analysis of Regression Problems in Machine Learning\",\"authors\":\"Nan Yang, Zeyu Zheng, Tianran Wang\",\"doi\":\"10.1145/3318299.3318367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.\",\"PeriodicalId\":164987,\"journal\":{\"name\":\"International Conference on Machine Learning and Computing\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318299.3318367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318299.3318367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

机器学习回归模型是基于正态分布的假设。本文主要研究机器学习模型的概率分布，以及不同损失函数的收敛值对概率分布模型的影响。基于稳健回归思想和模型方差齐次假设，利用最小二乘法求解了二维回归问题的统计解。利用极大似然估计法获得了概率模型的极大似然估计参数。为了比较两种方法的求解参数，分别使用L1损失函数和L2损失函数的收敛值进行回归验证。通过数学和统计学的严格推导，得到了两个重要结论;首先，在数据满足正态分布的条件下，基于方差齐次假设，概率模型符合多元高斯分布。其次，在大数定理条件下，满足多高斯分布的模型对参数估计的影响较小，即多高斯分布模型对损失函数有较好的容忍度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model Loss and Distribution Analysis of Regression Problems in Machine Learning

The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Conference on Machine Learning and Computing

自引率

0.00%

发文量