The Speed and Accuracy Evaluation of Random Forest Performance by Selecting Features in the Transformation Data

Maria Irmina Prasetiyowati, N. Maulidevi, K. Surendro
{"title":"The Speed and Accuracy Evaluation of Random Forest Performance by Selecting Features in the Transformation Data","authors":"Maria Irmina Prasetiyowati, N. Maulidevi, K. Surendro","doi":"10.1145/3386762.3386768","DOIUrl":null,"url":null,"abstract":"Random Forest is a machine learning method by building several trees in a forest, and getting the results of the classification by voting. The method of taking features to build a tree is done randomly, so there is a possibility that the feature chosen is not necessarily informative. Feature selection is needed to speed up the process. The feature selection used in this study is Correlation-based Feature Selection with the best first method. Based on the results of trials using six high-dimensional datasets, it was found that the selected features decreased by 15% to 96%. The average time needed to execute a Random Forest is less than that of a Random Forest execution on a dataset that has not been selected for features. This applies to datasets that have been transformed using Fast Fourier Transform, and returned using the Inverse Fast Fourier Transform. The average accuracy value for the dataset that has been transformed, accuracy has increased 0.03 to 0.08% compared to the dataset that has not been transformed. FFT is used to test the performace enhancement of the tranformed data of the Radom Forest.","PeriodicalId":147960,"journal":{"name":"Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 The 9th International Conference on Informatics, Environment, Energy and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3386762.3386768","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Random Forest is a machine learning method by building several trees in a forest, and getting the results of the classification by voting. The method of taking features to build a tree is done randomly, so there is a possibility that the feature chosen is not necessarily informative. Feature selection is needed to speed up the process. The feature selection used in this study is Correlation-based Feature Selection with the best first method. Based on the results of trials using six high-dimensional datasets, it was found that the selected features decreased by 15% to 96%. The average time needed to execute a Random Forest is less than that of a Random Forest execution on a dataset that has not been selected for features. This applies to datasets that have been transformed using Fast Fourier Transform, and returned using the Inverse Fast Fourier Transform. The average accuracy value for the dataset that has been transformed, accuracy has increased 0.03 to 0.08% compared to the dataset that has not been transformed. FFT is used to test the performace enhancement of the tranformed data of the Radom Forest.
基于变换数据特征选取的随机森林性能评价的速度和准确性
随机森林是一种机器学习方法,通过在森林中建立几棵树,并通过投票获得分类结果。采用特征构建树的方法是随机完成的,因此所选择的特征不一定具有信息。需要特征选择来加快这一过程。本研究使用的特征选择是基于相关性的特征选择,采用最佳第一方法。根据使用6个高维数据集的试验结果,发现选择的特征减少了15%至96%。执行随机森林所需的平均时间小于在未选择特征的数据集上执行随机森林所需的时间。这适用于使用快速傅里叶变换变换并使用快速反变换返回的数据集。转换后数据集的平均精度值,与未转换的数据集相比,精度提高了0.03 ~ 0.08%。利用FFT对变换后的随机森林数据进行性能增强测试。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信