Maria Irmina Prasetiyowati, N. Maulidevi, K. Surendro
{"title":"The Speed and Accuracy Evaluation of Random Forest Performance by Selecting Features in the Transformation Data","authors":"Maria Irmina Prasetiyowati, N. Maulidevi, K. Surendro","doi":"10.1145/3386762.3386768","DOIUrl":null,"url":null,"abstract":"Random Forest is a machine learning method by building several trees in a forest, and getting the results of the classification by voting. The method of taking features to build a tree is done randomly, so there is a possibility that the feature chosen is not necessarily informative. Feature selection is needed to speed up the process. The feature selection used in this study is Correlation-based Feature Selection with the best first method. Based on the results of trials using six high-dimensional datasets, it was found that the selected features decreased by 15% to 96%. The average time needed to execute a Random Forest is less than that of a Random Forest execution on a dataset that has not been selected for features. This applies to datasets that have been transformed using Fast Fourier Transform, and returned using the Inverse Fast Fourier Transform. The average accuracy value for the dataset that has been transformed, accuracy has increased 0.03 to 0.08% compared to the dataset that has not been transformed. FFT is used to test the performace enhancement of the tranformed data of the Radom Forest.","PeriodicalId":147960,"journal":{"name":"Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386762.3386768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Random Forest is a machine learning method by building several trees in a forest, and getting the results of the classification by voting. The method of taking features to build a tree is done randomly, so there is a possibility that the feature chosen is not necessarily informative. Feature selection is needed to speed up the process. The feature selection used in this study is Correlation-based Feature Selection with the best first method. Based on the results of trials using six high-dimensional datasets, it was found that the selected features decreased by 15% to 96%. The average time needed to execute a Random Forest is less than that of a Random Forest execution on a dataset that has not been selected for features. This applies to datasets that have been transformed using Fast Fourier Transform, and returned using the Inverse Fast Fourier Transform. The average accuracy value for the dataset that has been transformed, accuracy has increased 0.03 to 0.08% compared to the dataset that has not been transformed. FFT is used to test the performace enhancement of the tranformed data of the Radom Forest.