地形特征选择对随机森林和支持向量机分类算法的影响

2022 7th International Workshop on Big Data and Information Security (IWBIS) Pub Date : 2022-10-01 DOI:10.1109/IWBIS56557.2022.9924782

Iustisia Natalia Simbolon, Romual Naibaho

{"title":"地形特征选择对随机森林和支持向量机分类算法的影响","authors":"Iustisia Natalia Simbolon, Romual Naibaho","doi":"10.1109/IWBIS56557.2022.9924782","DOIUrl":null,"url":null,"abstract":"The classification technique is one of the popular techniques used in helping humans decide the target class of a data based on machine learning principles. Unfortunately the construction of a classification model has no limits and will always evolve over time. There is no surefire way to make a perfect classification model, but there are ways that at least make the classification model better. This study applies the feature selection method to produce a more optimal classification model accuracy value. Of the many feature selection algorithms, this research chooses Relief which is combined with a classification algorithm, namely Random Forest and Support Vector Machine. This research also applies the Grid Search Optimization method in selecting the most influential features. In addition, it is also used to select the best hyperparameters to build the classification model. For splitting the data set, the K Fold Cross Validation technique is used in order to get the most optimal proportion of data splitting. Compared to the accuracy values before and after feature selection, both classification algorithms after feature selection significantly outperform the classification model before feature selection. It was also found that the model’s capabilities in the real world, through validation with new data, performed quite well.","PeriodicalId":348371,"journal":{"name":"2022 7th International Workshop on Big Data and Information Security (IWBIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Influence of Relief Feature Selection on Random Forest and Support Vector Machine Classification Algorithm\",\"authors\":\"Iustisia Natalia Simbolon, Romual Naibaho\",\"doi\":\"10.1109/IWBIS56557.2022.9924782\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The classification technique is one of the popular techniques used in helping humans decide the target class of a data based on machine learning principles. Unfortunately the construction of a classification model has no limits and will always evolve over time. There is no surefire way to make a perfect classification model, but there are ways that at least make the classification model better. This study applies the feature selection method to produce a more optimal classification model accuracy value. Of the many feature selection algorithms, this research chooses Relief which is combined with a classification algorithm, namely Random Forest and Support Vector Machine. This research also applies the Grid Search Optimization method in selecting the most influential features. In addition, it is also used to select the best hyperparameters to build the classification model. For splitting the data set, the K Fold Cross Validation technique is used in order to get the most optimal proportion of data splitting. Compared to the accuracy values before and after feature selection, both classification algorithms after feature selection significantly outperform the classification model before feature selection. It was also found that the model’s capabilities in the real world, through validation with new data, performed quite well.\",\"PeriodicalId\":348371,\"journal\":{\"name\":\"2022 7th International Workshop on Big Data and Information Security (IWBIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 7th International Workshop on Big Data and Information Security (IWBIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWBIS56557.2022.9924782\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 7th International Workshop on Big Data and Information Security (IWBIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBIS56557.2022.9924782","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

分类技术是基于机器学习原理帮助人类确定数据目标类别的常用技术之一。不幸的是，分类模型的构建没有限制，并且总是会随着时间的推移而发展。没有万无一失的方法可以建立一个完美的分类模型，但至少有一些方法可以使分类模型变得更好。本研究采用特征选择方法产生更优的分类模型精度值。在众多的特征选择算法中，本研究选择了Relief，它结合了一种分类算法，即随机森林和支持向量机。本研究还应用网格搜索优化方法来选择最具影响力的特征。此外，它还用于选择最佳的超参数来构建分类模型。对于数据集的分割，为了得到最优的数据分割比例，使用了K Fold交叉验证技术。对比特征选择前后的准确率值，两种选择后的分类算法都明显优于特征选择前的分类模型。通过对新数据的验证，还发现该模型在现实世界中的性能表现相当好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Influence of Relief Feature Selection on Random Forest and Support Vector Machine Classification Algorithm

The classification technique is one of the popular techniques used in helping humans decide the target class of a data based on machine learning principles. Unfortunately the construction of a classification model has no limits and will always evolve over time. There is no surefire way to make a perfect classification model, but there are ways that at least make the classification model better. This study applies the feature selection method to produce a more optimal classification model accuracy value. Of the many feature selection algorithms, this research chooses Relief which is combined with a classification algorithm, namely Random Forest and Support Vector Machine. This research also applies the Grid Search Optimization method in selecting the most influential features. In addition, it is also used to select the best hyperparameters to build the classification model. For splitting the data set, the K Fold Cross Validation technique is used in order to get the most optimal proportion of data splitting. Compared to the accuracy values before and after feature selection, both classification algorithms after feature selection significantly outperform the classification model before feature selection. It was also found that the model’s capabilities in the real world, through validation with new data, performed quite well.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 7th International Workshop on Big Data and Information Security (IWBIS)

自引率

0.00%

发文量