The Speed and Accuracy Evaluation of Random Forest Performance by Selecting Features in the Transformation Data

Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications Pub Date : 2020-03-13 DOI:10.1145/3386762.3386768

Maria Irmina Prasetiyowati, N. Maulidevi, K. Surendro

{"title":"The Speed and Accuracy Evaluation of Random Forest Performance by Selecting Features in the Transformation Data","authors":"Maria Irmina Prasetiyowati, N. Maulidevi, K. Surendro","doi":"10.1145/3386762.3386768","DOIUrl":null,"url":null,"abstract":"Random Forest is a machine learning method by building several trees in a forest, and getting the results of the classification by voting. The method of taking features to build a tree is done randomly, so there is a possibility that the feature chosen is not necessarily informative. Feature selection is needed to speed up the process. The feature selection used in this study is Correlation-based Feature Selection with the best first method. Based on the results of trials using six high-dimensional datasets, it was found that the selected features decreased by 15% to 96%. The average time needed to execute a Random Forest is less than that of a Random Forest execution on a dataset that has not been selected for features. This applies to datasets that have been transformed using Fast Fourier Transform, and returned using the Inverse Fast Fourier Transform. The average accuracy value for the dataset that has been transformed, accuracy has increased 0.03 to 0.08% compared to the dataset that has not been transformed. FFT is used to test the performace enhancement of the tranformed data of the Radom Forest.","PeriodicalId":147960,"journal":{"name":"Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386762.3386768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Random Forest is a machine learning method by building several trees in a forest, and getting the results of the classification by voting. The method of taking features to build a tree is done randomly, so there is a possibility that the feature chosen is not necessarily informative. Feature selection is needed to speed up the process. The feature selection used in this study is Correlation-based Feature Selection with the best first method. Based on the results of trials using six high-dimensional datasets, it was found that the selected features decreased by 15% to 96%. The average time needed to execute a Random Forest is less than that of a Random Forest execution on a dataset that has not been selected for features. This applies to datasets that have been transformed using Fast Fourier Transform, and returned using the Inverse Fast Fourier Transform. The average accuracy value for the dataset that has been transformed, accuracy has increased 0.03 to 0.08% compared to the dataset that has not been transformed. FFT is used to test the performace enhancement of the tranformed data of the Radom Forest.

查看原文本刊更多论文

基于变换数据特征选取的随机森林性能评价的速度和准确性

随机森林是一种机器学习方法，通过在森林中建立几棵树，并通过投票获得分类结果。采用特征构建树的方法是随机完成的，因此所选择的特征不一定具有信息。需要特征选择来加快这一过程。本研究使用的特征选择是基于相关性的特征选择，采用最佳第一方法。根据使用6个高维数据集的试验结果，发现选择的特征减少了15%至96%。执行随机森林所需的平均时间小于在未选择特征的数据集上执行随机森林所需的时间。这适用于使用快速傅里叶变换变换并使用快速反变换返回的数据集。转换后数据集的平均精度值，与未转换的数据集相比，精度提高了0.03 ~ 0.08%。利用FFT对变换后的随机森林数据进行性能增强测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications

自引率

0.00%

发文量