Hybrid Model of Correlation Based Filter Feature Selection and Machine Learning Classifiers Applied on Smart Meter Data Set

2019 IEEE/ACM Symposium on Software Engineering in Africa (SEiA) Pub Date : 2019-05-01 DOI:10.1109/SEiA.2019.00009

Sinayobye Janvier Omar, Fred N. Kiwanuka, Kaawaase Kyanda Swaib, M. Richard

{"title":"Hybrid Model of Correlation Based Filter Feature Selection and Machine Learning Classifiers Applied on Smart Meter Data Set","authors":"Sinayobye Janvier Omar, Fred N. Kiwanuka, Kaawaase Kyanda Swaib, M. Richard","doi":"10.1109/SEiA.2019.00009","DOIUrl":null,"url":null,"abstract":"Feature selection is referred to the process of obtaining a subset from an original feature set according to certain feature selection criterion, which selects the relevant features of the dataset. It plays a role in compressing the data processing scale, where the redundant and irrelevant features are removed. Feature selection techniques show that more information is not always good in machine learning applications. Apply different algorithms for the data at hand and with baseline classification performance values we can select a final feature selection algorithm. In this paper, we propose a hybrid classification model, which has correlation based filter feature selection algorithm and Machine learning as classifiers. The objective of this study is to select relevant features and analyze the outperform machine learning algorithms in order to train our model, predict and compare their classification performance. In this method, features are ordered according to their Absolute correlation value with respect to the class attribute. Then top K Features are selected from ordered list of features to form a reduced dataset. This proposed classifier model is applied to our smart meter datasets. To measure the performance of these selected features; seven benchmark classifier are used; Random Forest (RF), Logistic Regression (LR), k-Nearest Neighbor (kNN), Naïve Bayes (NB), Decision Tree (DT), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM). This paper then analyzes the performance of all classifiers with feature selection in term of accuracy, sensitivity, F-Measure, Specificity, Precision, and MCC. From our experiment, we found that Random Forest classifier performed higher than other used classifiers.","PeriodicalId":244936,"journal":{"name":"2019 IEEE/ACM Symposium on Software Engineering in Africa (SEiA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM Symposium on Software Engineering in Africa (SEiA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEiA.2019.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Feature selection is referred to the process of obtaining a subset from an original feature set according to certain feature selection criterion, which selects the relevant features of the dataset. It plays a role in compressing the data processing scale, where the redundant and irrelevant features are removed. Feature selection techniques show that more information is not always good in machine learning applications. Apply different algorithms for the data at hand and with baseline classification performance values we can select a final feature selection algorithm. In this paper, we propose a hybrid classification model, which has correlation based filter feature selection algorithm and Machine learning as classifiers. The objective of this study is to select relevant features and analyze the outperform machine learning algorithms in order to train our model, predict and compare their classification performance. In this method, features are ordered according to their Absolute correlation value with respect to the class attribute. Then top K Features are selected from ordered list of features to form a reduced dataset. This proposed classifier model is applied to our smart meter datasets. To measure the performance of these selected features; seven benchmark classifier are used; Random Forest (RF), Logistic Regression (LR), k-Nearest Neighbor (kNN), Naïve Bayes (NB), Decision Tree (DT), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM). This paper then analyzes the performance of all classifiers with feature selection in term of accuracy, sensitivity, F-Measure, Specificity, Precision, and MCC. From our experiment, we found that Random Forest classifier performed higher than other used classifiers.

查看原文本刊更多论文

基于相关滤波特征选择和机器学习分类器的混合模型在智能电表数据集上的应用

特征选择是指根据一定的特征选择准则从原始特征集中获得一个子集的过程，该子集选择数据集的相关特征。它起到压缩数据处理规模的作用，去除冗余和不相关的特征。特征选择技术表明，在机器学习应用中，更多的信息并不总是好的。对手头的数据应用不同的算法，并根据基线分类性能值，我们可以选择最终的特征选择算法。在本文中，我们提出了一种混合分类模型，该模型将基于相关性的滤波器特征选择算法和机器学习作为分类器。本研究的目的是选择相关特征并分析表现较好的机器学习算法，以训练我们的模型，预测和比较它们的分类性能。在该方法中，根据特征相对于类属性的绝对相关值对特征进行排序。然后从有序的特征列表中选择top K个特征形成约简数据集。该分类器模型应用于我们的智能电表数据集。衡量这些选定特性的性能;采用了7个基准分类器;随机森林(RF)、逻辑回归(LR)、k近邻(kNN)、Naïve贝叶斯(NB)、决策树(DT)、线性判别分析(LDA)和支持向量机(SVM)。然后，本文从准确度、灵敏度、F-Measure、特异性、精度和MCC等方面分析了所有带有特征选择的分类器的性能。从我们的实验中，我们发现随机森林分类器比其他使用的分类器性能更高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM Symposium on Software Engineering in Africa (SEiA)

自引率

0.00%

发文量