机器学习算法对数据归一化方法的视差响应研究

IF 2.1 Q3 MULTIDISCIPLINARY SCIENCES

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY Pub Date : 2022-09-19 DOI:10.14500/aro.10970

Haval A. Ahmed, Peshawa J. Muhammad Ali, Abdulbasit K. Faeq, Saman M. Abdullah

{"title":"机器学习算法对数据归一化方法的视差响应研究","authors":"Haval A. Ahmed, Peshawa J. Muhammad Ali, Abdulbasit K. Faeq, Saman M. Abdullah","doi":"10.14500/aro.10970","DOIUrl":null,"url":null,"abstract":"Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset.","PeriodicalId":8398,"journal":{"name":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","volume":"46 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method\",\"authors\":\"Haval A. Ahmed, Peshawa J. Muhammad Ali, Abdulbasit K. Faeq, Saman M. Abdullah\",\"doi\":\"10.14500/aro.10970\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset.\",\"PeriodicalId\":8398,\"journal\":{\"name\":\"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14500/aro.10970\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14500/aro.10970","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 4

摘要

在某些机器学习(ML)技术中，数据归一化可以用于消除不一致范围的影响，并在其他技术中加速优化过程。许多研究采用不同的数据归一化方法，目的是减少或消除数据方差对基于ml的模型准确率的影响。然而，这种影响与ML算法的数学概念一致的重要性仍然需要更多的调查和测试。为了确定这一点，这项工作提出了一种涉及三种不同机器学习算法的调查方法，即支持向量机(SVM)、人工神经网络(ANN)和基于欧几里得的k近邻(E-KNN)。在整个工作中，使用了五种不同的数据集，每种数据集都取自不同的应用领域，具有不同的统计特性。虽然有许多可用的数据归一化方法，但本工作主要关注最小-最大方法，因为它主动消除了数据集范围不一致的影响。此外，本工作还考虑了其他挑战最小-最大归一化过程的因素，例如包括或排除异常值或最不显著特征。这项工作的发现表明，每种ML技术对最小-最大归一化的响应不同。支持向量机模型的性能得到了提高，而人工神经网络模型的性能没有明显提高。结果表明，最小-最大归一化可以提高或降低E-KNN模型的性能，这取决于数据集的统计特性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method

Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY MULTIDISCIPLINARY SCIENCES-

自引率

33.30%

发文量

审稿时长

16 weeks