机器学习中回归问题的模型损失与分布分析

Nan Yang, Zeyu Zheng, Tianran Wang
{"title":"机器学习中回归问题的模型损失与分布分析","authors":"Nan Yang, Zeyu Zheng, Tianran Wang","doi":"10.1145/3318299.3318367","DOIUrl":null,"url":null,"abstract":"The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.","PeriodicalId":164987,"journal":{"name":"International Conference on Machine Learning and Computing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Model Loss and Distribution Analysis of Regression Problems in Machine Learning\",\"authors\":\"Nan Yang, Zeyu Zheng, Tianran Wang\",\"doi\":\"10.1145/3318299.3318367\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.\",\"PeriodicalId\":164987,\"journal\":{\"name\":\"International Conference on Machine Learning and Computing\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Machine Learning and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3318299.3318367\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3318299.3318367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

机器学习回归模型是基于正态分布的假设。本文主要研究机器学习模型的概率分布,以及不同损失函数的收敛值对概率分布模型的影响。基于稳健回归思想和模型方差齐次假设,利用最小二乘法求解了二维回归问题的统计解。利用极大似然估计法获得了概率模型的极大似然估计参数。为了比较两种方法的求解参数,分别使用L1损失函数和L2损失函数的收敛值进行回归验证。通过数学和统计学的严格推导,得到了两个重要结论;首先,在数据满足正态分布的条件下,基于方差齐次假设,概率模型符合多元高斯分布。其次,在大数定理条件下,满足多高斯分布的模型对参数估计的影响较小,即多高斯分布模型对损失函数有较好的容忍度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Model Loss and Distribution Analysis of Regression Problems in Machine Learning
The machine learning regression model is based on the assumption of normal distribution. In this paper, we mainly study the probability distribution of the machine learning model and the effect of the convergence values of different loss functions on the probability distribution model. Based on the idea of robust regression and the assumption of homogeneous variance of the model, we solved the statistical solution of two-dimensional regression problem by using least square method. The maximum likelihood estimation parameters of the probabilistic model are obtained by using the maximum likelihood estimation method. In order to compare the solving parameters of the two methods, the convergence values of L1 loss function and L2 loss function are used for the regression verification. Through the mathematical and statistical rigorous derivation, obtained two important conclusions; First, under the condition that the data satisfies normal distribution and is based on the assumption of homogeneous variance, the probability model conforms to the multivariate gaussian distribution. Secondly, the model satisfying the multi-gaussian distribution has little influence on the parameter estimation under the condition of the large number theorem, that is, the multi-gaussian distribution model has good tolerance to the loss function.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信