Perbandingan Kinerja Metode Regresi K-Nearest Neighbor dan Metode Regresi Linear Berganda pada Data Boston Housing

Jambura Journal of Probability and Statistics Pub Date : 2023-05-31 DOI:10.34312/jjps.v4i1.18948

Lutfi Sivana Ihzaniah, Adi Setiawan, R. W. N. Wijaya

{"title":"Perbandingan Kinerja Metode Regresi K-Nearest Neighbor dan Metode Regresi Linear Berganda pada Data Boston Housing","authors":"Lutfi Sivana Ihzaniah, Adi Setiawan, R. W. N. Wijaya","doi":"10.34312/jjps.v4i1.18948","DOIUrl":null,"url":null,"abstract":"This research was made in order to see which method performance is better between the KNN (K-Nearest Neighbor) regression method and the multiple linear regression method on Boston Housing data. The method performace referred here is MAE, RMSE, MAPE, and R2. The KNN method is a method to predict something based on the closest training examples of an object. Meanwhile, multiple linear regression is a forecasting technique involving more than one independent variable. The comparison of the two methods is based on the results of the Mean Absolute Percent Error (MAPE). In this research the definitions of distance used are Euclidean distance and Minkowski distance. The K value in the KNN method defines the number of nearest neighbors to be examined to determine the value of a dependent variable, in this research we use K values from 1 to 10 for each test data and definition of distance. In this research, the percentage of test data used was 20%, 30%, and 40% for both methods. The best MAPE value obtained by the KNN regression method was 12,89% at K = 3 for Euclidean distance and 13,22% at K = 3 for Minkowski distance. Meanwhile the best MAPE value for the multiple linear regression method is 17,17%. The best method between the two methods is the KNN regression method as seen from the MAPE value of the KNN regression method which is smaller than the MAPE value of the multiple linear regression method.","PeriodicalId":315674,"journal":{"name":"Jambura Journal of Probability and Statistics","volume":"159 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jambura Journal of Probability and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34312/jjps.v4i1.18948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This research was made in order to see which method performance is better between the KNN (K-Nearest Neighbor) regression method and the multiple linear regression method on Boston Housing data. The method performace referred here is MAE, RMSE, MAPE, and R2. The KNN method is a method to predict something based on the closest training examples of an object. Meanwhile, multiple linear regression is a forecasting technique involving more than one independent variable. The comparison of the two methods is based on the results of the Mean Absolute Percent Error (MAPE). In this research the definitions of distance used are Euclidean distance and Minkowski distance. The K value in the KNN method defines the number of nearest neighbors to be examined to determine the value of a dependent variable, in this research we use K values from 1 to 10 for each test data and definition of distance. In this research, the percentage of test data used was 20%, 30%, and 40% for both methods. The best MAPE value obtained by the KNN regression method was 12,89% at K = 3 for Euclidean distance and 13,22% at K = 3 for Minkowski distance. Meanwhile the best MAPE value for the multiple linear regression method is 17,17%. The best method between the two methods is the KNN regression method as seen from the MAPE value of the KNN regression method which is smaller than the MAPE value of the multiple linear regression method.

查看原文本刊更多论文

邻近K-Nearest回归方法与波士顿-豪斯数据线性回归方法的比较

本研究是为了比较在Boston Housing数据上，KNN (K-Nearest Neighbor)回归方法和多元线性回归方法哪个方法的性能更好。这里提到的方法性能是MAE、RMSE、MAPE和R2。KNN方法是一种基于对象最接近的训练示例来预测事物的方法。同时，多元线性回归是一种涉及多个自变量的预测技术。两种方法的比较是基于平均绝对百分比误差(MAPE)的结果。在本研究中使用的距离定义是欧几里得距离和闵可夫斯基距离。KNN方法中的K值定义了要检查的最近邻居的数量，以确定因变量的值，在本研究中，我们使用K值从1到10来定义每个测试数据和距离。在本研究中，两种方法使用的测试数据百分比分别为20%、30%和40%。在欧氏距离K = 3时，KNN回归方法得到的最佳MAPE值为12.89%，在闵可夫斯基距离K = 3时，MAPE值为13.22%。多元线性回归方法的最佳MAPE值为17、17%。从KNN回归方法的MAPE值小于多元线性回归方法的MAPE值来看，两种方法之间的最佳方法是KNN回归方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Jambura Journal of Probability and Statistics

自引率

0.00%

发文量