更好的泛化和回归分析的回归模型

Mohiuddeen Khan, Kanishk Srivastava
{"title":"更好的泛化和回归分析的回归模型","authors":"Mohiuddeen Khan, Kanishk Srivastava","doi":"10.1145/3380688.3380691","DOIUrl":null,"url":null,"abstract":"Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.","PeriodicalId":414793,"journal":{"name":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Regression Model for Better Generalization and Regression Analysis\",\"authors\":\"Mohiuddeen Khan, Kanishk Srivastava\",\"doi\":\"10.1145/3380688.3380691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.\",\"PeriodicalId\":414793,\"journal\":{\"name\":\"Proceedings of the 4th International Conference on Machine Learning and Soft Computing\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 4th International Conference on Machine Learning and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3380688.3380691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 4th International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3380688.3380691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

多项式回归等回归模型在训练实例上进行训练时,有时会因为多项式度值小而导致偏差大或欠拟合,导致不能很好地优化,对新的训练实例泛化效果差,也可能因为多项式拟合度高而导致方差大或过拟合。由于曲线曲率的不断变化,以及数据集曲线点的局部极值所产生的曲线的增减性质,使得假设曲线不能以较小的程度拟合所有的训练实例。曲线之间的局部极值由于多项式度小,使得假设曲线难以通过所有的训练实例进行拟合。通过将假设曲线分解为极值,即局部最大值和局部最小值,并为每个最大值-最小值或最小值-最大值区间部署单独的回归模型,可以实现更好的优化和泛化。由于没有任何局部极值,在区间之间曲线的曲率变化非常小,因此可以减少用于拟合模型的训练实例的数量。由于训练实例的减少,算法所花费的时间减少了,这使得模型的计算成本非常低。在UCI机器学习存储库数据集上进行测试时,该算法在联合循环电厂数据集[1]上的多项式回归准确率为53.47%,在联合循环电厂数据集[1]上的准确率为92.06%,在房地产估值数据集[2]上的多项式回归准确率为85.41%,在房地产估值数据集[2]上的准确率为96.33%。该方法可以为任何数学研究领域的改进提供非常有益的帮助,如偏差方差、成本最小化和统计曲线的更好拟合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Regression Model for Better Generalization and Regression Analysis
Regression models such as polynomial regression when deployed for training on training instances may sometimes not optimize well and leads to poor generalization on new training instances due to high bias or underfitting due to small value of polynomial degree and may lead to high variance or overfitting due to high degree of polynomial fitting degree. The hypothesis curve is not able to fit all the training instances with a smaller degree due to the changing curvature of curve again and again and also due to the increasing and decreasing nature of curve arising from the local extremas from the plot of points of the dataset curve. The local extremas in between the curve makes the hypothesis curve difficult to fit through all the training instances due to the small polynomial degree. Better optimization and generalization can be achieved by breaking the hypothesis curve into extremas i.e. local maximas and local minimas and deploying separate regression models for each maxima-minima or minima-maxima interval. The number of training instances used to fit the model can be reduced due to very less change in curvature of the curve between an interval due to absence of any local extrema. The time taken by the algorithm reduces due to reduction in the training instances to train which makes the model very less computationally expensive. The algorithm when tested on the UCI machine learning repository datasets gave an accuracy of 53.47% using polynomial regression and 92.06% using our algorithm on Combined Cycle Power Plant Data Set [1] and accuracy of 85.41% using polynomial regression and 96.33% by our algorithm on Real estate valuation Data Set [2]. The approach can be very beneficial for any betterment of mathematical field of study related to bias-variance, cost minimization and better fitting of curves in statistics.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信