Willdan Aprizal Arifin, I. Ariawan, A. A. Rosalia, L. Lukman, Nabila Tufailah
{"title":"Data scaling performance on various machine learning algorithms to identify abalone sex","authors":"Willdan Aprizal Arifin, I. Ariawan, A. A. Rosalia, L. Lukman, Nabila Tufailah","doi":"10.14710/jtsiskom.2021.14105","DOIUrl":null,"url":null,"abstract":"This study aims to analyze the performance of machine learning algorithms with the data scaling process to show the method's effectiveness. It uses min-max (normalization) and zero-mean (standardization) data scaling techniques in the abalone dataset. The stages carried out in this study included data normalization on the data of abalone physical measurement features. The model evaluation was carried out using k-fold cross-validation with the number of k-fold 10. Abalone datasets were normalized in machine learning algorithms: Random Forest, Naïve Bayesian, Decision Tree, and SVM (RBF kernels and linear kernels). The eight features of the abalone dataset show that machine learning algorithms did not too influence data scaling. There is an increase in the performance of SVM, while Random Forest decreases when the abalone dataset is applied to data scaling. Random Forest has the highest average balanced accuracy (74.87%) without data scaling.","PeriodicalId":56231,"journal":{"name":"Jurnal Teknologi dan Sistem Komputer","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknologi dan Sistem Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14710/jtsiskom.2021.14105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to analyze the performance of machine learning algorithms with the data scaling process to show the method's effectiveness. It uses min-max (normalization) and zero-mean (standardization) data scaling techniques in the abalone dataset. The stages carried out in this study included data normalization on the data of abalone physical measurement features. The model evaluation was carried out using k-fold cross-validation with the number of k-fold 10. Abalone datasets were normalized in machine learning algorithms: Random Forest, Naïve Bayesian, Decision Tree, and SVM (RBF kernels and linear kernels). The eight features of the abalone dataset show that machine learning algorithms did not too influence data scaling. There is an increase in the performance of SVM, while Random Forest decreases when the abalone dataset is applied to data scaling. Random Forest has the highest average balanced accuracy (74.87%) without data scaling.