研究最小-最大数据归一化对不同相似性度量下k -近邻回归性能的影响

IF 1.2 Q3 MULTIDISCIPLINARY SCIENCES

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY Pub Date : 2022-06-21 DOI:10.14500/aro.10955

Peshawa J. Muhammad Ali

{"title":"研究最小-最大数据归一化对不同相似性度量下k -近邻回归性能的影响","authors":"Peshawa J. Muhammad Ali","doi":"10.14500/aro.10955","DOIUrl":null,"url":null,"abstract":" K-nearest neighbor (KNN) is a lazy supervised learning algorithm, which depends on computing the similarity between the target and the closest neighbor(s). On the other hand, min-max normalization has been reported as a useful method for eliminating the impact of inconsistent ranges among attributes on the efficiency of some machine learning models. The impact of min-max normalization on the performance of KNN models is still not clear, and it needs more investigation. Therefore, this research examines the impacts of the min-max normalization method on the regression performance of KNN models utilizing eight different similarity measures, which are City block, Euclidean, Chebychev, Cosine, Correlation, Hamming, Jaccard, and Mahalanobis. Five benchmark datasets have been used to test the accuracy of the KNN models with the original dataset and the normalized dataset. Mean squared error (MSE) has been utilized as a performance indicator to compare the results. It’s been concluded that the impact of min-max normalization on the KNN models utilizing City block, Euclidean, Chebychev, Cosine, and Correlation depends on the nature of the dataset itself, therefore, testing models on both original and normalized datasets are recommended. The performance of KNN models utilizing Hamming, Jaccard, and Mahalanobis makes no difference by adopting min-max normalization because of their ratio nature, and dataset covariance involvement in the similarity calculations. Results showed that Mahalanobis outperformed the other seven similarity measures. This research is better than its peers in terms of reliability, and quality because it depended on testing different datasets from different application fields.","PeriodicalId":8398,"journal":{"name":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","volume":"6 1","pages":""},"PeriodicalIF":1.2000,"publicationDate":"2022-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Investigating the Impact of Min-Max Data Normalization on the Regression Performance of K-Nearest Neighbor with Different Similarity Measurements\",\"authors\":\"Peshawa J. Muhammad Ali\",\"doi\":\"10.14500/aro.10955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\" K-nearest neighbor (KNN) is a lazy supervised learning algorithm, which depends on computing the similarity between the target and the closest neighbor(s). On the other hand, min-max normalization has been reported as a useful method for eliminating the impact of inconsistent ranges among attributes on the efficiency of some machine learning models. The impact of min-max normalization on the performance of KNN models is still not clear, and it needs more investigation. Therefore, this research examines the impacts of the min-max normalization method on the regression performance of KNN models utilizing eight different similarity measures, which are City block, Euclidean, Chebychev, Cosine, Correlation, Hamming, Jaccard, and Mahalanobis. Five benchmark datasets have been used to test the accuracy of the KNN models with the original dataset and the normalized dataset. Mean squared error (MSE) has been utilized as a performance indicator to compare the results. It’s been concluded that the impact of min-max normalization on the KNN models utilizing City block, Euclidean, Chebychev, Cosine, and Correlation depends on the nature of the dataset itself, therefore, testing models on both original and normalized datasets are recommended. The performance of KNN models utilizing Hamming, Jaccard, and Mahalanobis makes no difference by adopting min-max normalization because of their ratio nature, and dataset covariance involvement in the similarity calculations. Results showed that Mahalanobis outperformed the other seven similarity measures. This research is better than its peers in terms of reliability, and quality because it depended on testing different datasets from different application fields.\",\"PeriodicalId\":8398,\"journal\":{\"name\":\"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14500/aro.10955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14500/aro.10955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 6

摘要

k -最近邻(KNN)是一种懒惰监督学习算法，它依赖于计算目标与最近邻之间的相似度。另一方面，据报道，最小-最大归一化是一种有用的方法，用于消除属性之间不一致范围对某些机器学习模型效率的影响。最小-最大归一化对KNN模型性能的影响尚不清楚，需要进一步研究。因此，本研究利用8种不同的相似性度量，即City block、Euclidean、Chebychev、Cosine、Correlation、Hamming、Jaccard和Mahalanobis，考察了最小-最大归一化方法对KNN模型回归性能的影响。用5个基准数据集对原始数据集和归一化数据集的KNN模型的准确性进行了测试。均方误差(MSE)被用作比较结果的性能指标。综上所述，最小-最大归一化对使用City block、Euclidean、Chebychev、Cosine和Correlation的KNN模型的影响取决于数据集本身的性质，因此，建议在原始数据集和归一化数据集上测试模型。使用Hamming, Jaccard和Mahalanobis的KNN模型的性能不会因为采用最小-最大归一化而产生差异，因为它们的比率性质和数据集协方差涉及相似性计算。结果表明，马哈拉诺比斯的相似性优于其他7种相似性指标。这项研究在可靠性和质量方面优于同行，因为它依赖于测试来自不同应用领域的不同数据集。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Investigating the Impact of Min-Max Data Normalization on the Regression Performance of K-Nearest Neighbor with Different Similarity Measurements

K-nearest neighbor (KNN) is a lazy supervised learning algorithm, which depends on computing the similarity between the target and the closest neighbor(s). On the other hand, min-max normalization has been reported as a useful method for eliminating the impact of inconsistent ranges among attributes on the efficiency of some machine learning models. The impact of min-max normalization on the performance of KNN models is still not clear, and it needs more investigation. Therefore, this research examines the impacts of the min-max normalization method on the regression performance of KNN models utilizing eight different similarity measures, which are City block, Euclidean, Chebychev, Cosine, Correlation, Hamming, Jaccard, and Mahalanobis. Five benchmark datasets have been used to test the accuracy of the KNN models with the original dataset and the normalized dataset. Mean squared error (MSE) has been utilized as a performance indicator to compare the results. It’s been concluded that the impact of min-max normalization on the KNN models utilizing City block, Euclidean, Chebychev, Cosine, and Correlation depends on the nature of the dataset itself, therefore, testing models on both original and normalized datasets are recommended. The performance of KNN models utilizing Hamming, Jaccard, and Mahalanobis makes no difference by adopting min-max normalization because of their ratio nature, and dataset covariance involvement in the similarity calculations. Results showed that Mahalanobis outperformed the other seven similarity measures. This research is better than its peers in terms of reliability, and quality because it depended on testing different datasets from different application fields.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY MULTIDISCIPLINARY SCIENCES-

自引率

33.30%

发文量

审稿时长

16 weeks