MULTIPLE REGRESSION ANALYSIS SYSTEM IN MACHINE LEARNING AND ESTIMATING EFFECTS OF DATA TRANSFORMATION&NORMALIZATION

Anadolu University Journal of Science and Technology-A Applied Sciences and Engineering Pub Date : 2018-12-29 DOI:10.30931/jetas.475215

A. Sayli, Ceyda Akbulut, Kemal Kosuta

{"title":"MULTIPLE REGRESSION ANALYSIS SYSTEM IN MACHINE LEARNING AND ESTIMATING EFFECTS OF DATA TRANSFORMATION&NORMALIZATION","authors":"A. Sayli, Ceyda Akbulut, Kemal Kosuta","doi":"10.30931/jetas.475215","DOIUrl":null,"url":null,"abstract":"Machine learning area is a recent topic in data analysis and a researcher or worker of the area is called “Data Scientist” which nowadays has been a highly preferred job title in computing. In this study, we have two aims that the first is to implement a multiple regression analysis system which is developed in Ubuntu operating system on the Anaconda platform using Python3 in order to construct models of each attribute to make their estimations for future decisions taking less risk in advance of past experiences hided in cumulated data and the second aim is to find out effects of data transformation and min-max normalization in the data preparation before building models. After the system implementation, we test the system to determine the best estimation model of each attribute of the vehicles sold in the five European countries between 1970 and 1999. We have constructed six versions of the original dataset and these versions are used to construct regression models for further estimations. Finally, we compute the regression criterion value of R-Squared for each constructed-model and we compare the models according to these values. Computational results are very promising that the system can be used efficiently and the effects of the data transformation and min-max normalization are significant for some attributes.","PeriodicalId":7757,"journal":{"name":"Anadolu University Journal of Science and Technology-A Applied Sciences and Engineering","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-12-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anadolu University Journal of Science and Technology-A Applied Sciences and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30931/jetas.475215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine learning area is a recent topic in data analysis and a researcher or worker of the area is called “Data Scientist” which nowadays has been a highly preferred job title in computing. In this study, we have two aims that the first is to implement a multiple regression analysis system which is developed in Ubuntu operating system on the Anaconda platform using Python3 in order to construct models of each attribute to make their estimations for future decisions taking less risk in advance of past experiences hided in cumulated data and the second aim is to find out effects of data transformation and min-max normalization in the data preparation before building models. After the system implementation, we test the system to determine the best estimation model of each attribute of the vehicles sold in the five European countries between 1970 and 1999. We have constructed six versions of the original dataset and these versions are used to construct regression models for further estimations. Finally, we compute the regression criterion value of R-Squared for each constructed-model and we compare the models according to these values. Computational results are very promising that the system can be used efficiently and the effects of the data transformation and min-max normalization are significant for some attributes.

查看原文本刊更多论文

多元回归分析系统在机器学习和估计效果的数据转换与归一化

机器学习领域是数据分析领域的一个最新话题，该领域的研究人员或工作人员被称为“数据科学家”，这是当今计算领域一个非常受欢迎的职位。在这项研究中,我们有两个目标,第一是实现多元回归分析系统开发在蟒蛇Ubuntu操作系统平台使用Python3为了构建模型的每个属性估计未来决定提前采取风险较小的累积数据中隐藏的过去的经验,第二个目的是找出影响数据转换和min-max正常化前的数据准备建筑模型。在系统实现后，我们对系统进行了测试，以确定1970年至1999年间在欧洲五国销售的汽车的各个属性的最佳估计模型。我们构建了原始数据集的六个版本，这些版本用于构建回归模型以进行进一步估计。最后，我们计算了每个模型的回归准则R-Squared的值，并根据这些值对模型进行比较。计算结果表明，该系统可以有效地使用，对某些属性进行数据转换和最小-最大归一化处理效果显著。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Anadolu University Journal of Science and Technology-A Applied Sciences and Engineering

自引率

0.00%

发文量