Studying the Classification Accuracy Performance when Representation is Changed on Several Classifier Techniques

Ehab A. Omer A. Omer, Wisam H. Benamer
{"title":"Studying the Classification Accuracy Performance when Representation is Changed on Several Classifier Techniques","authors":"Ehab A. Omer A. Omer, Wisam H. Benamer","doi":"10.1145/3069593.3069597","DOIUrl":null,"url":null,"abstract":"Introduction: During the process of building a predictive data mining module achieving the highest accuracy is major concern by all researchers. Studying the impact of data representation on the performance of classification accuracy is essential. Recent researches travel among classifiers techniques looking for suitable and higher classification accuracy to build strong modules. Adding extra dimensional by focusing on the reflects that data representation might have on the classification accuracy data mining predictive techniques is the ultimate goal of this research. Methods: In this research seven different data representations were performed on several classifier techniques. These representations were AS_IS representation and three from the binary section and three from normalization section. The binary section included simple binary representation, flag representation and thermometer representation while the normalization section included min max normalization, sigmoidal normalization and standard deviation normalization. These seven representations were applied on eight classifiers Neural Network, Logistic Regression, K nearest Neighbor, Support Vector Machine, Classification Tree, Naive Bayesian, Rule based and Random Forest Decision Tree. Moreover, two datasets have been used for testing the performance of classification accuracy, namely Wisconsin Breast Cancer and German Credit and these two datasets have Boolean target class. Results: The fourteen data representations were raised from two datasets Wisconsin Breast Cancer and German Credit with seven different data representations for each. These data representations were performed on several classifier techniques using Orange software. The results achieved showed variation of the performance among all classifier in classification accuracy. Excluding Naive Bayesian which had over 60 % different from the lowest to the highest accuracy, all other classifier techniques had diverging on classification accuracy around 4.2%.","PeriodicalId":383937,"journal":{"name":"Proceedings of the International Conference on High Performance Compilation, Computing and Communications","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Compilation, Computing and Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3069593.3069597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Introduction: During the process of building a predictive data mining module achieving the highest accuracy is major concern by all researchers. Studying the impact of data representation on the performance of classification accuracy is essential. Recent researches travel among classifiers techniques looking for suitable and higher classification accuracy to build strong modules. Adding extra dimensional by focusing on the reflects that data representation might have on the classification accuracy data mining predictive techniques is the ultimate goal of this research. Methods: In this research seven different data representations were performed on several classifier techniques. These representations were AS_IS representation and three from the binary section and three from normalization section. The binary section included simple binary representation, flag representation and thermometer representation while the normalization section included min max normalization, sigmoidal normalization and standard deviation normalization. These seven representations were applied on eight classifiers Neural Network, Logistic Regression, K nearest Neighbor, Support Vector Machine, Classification Tree, Naive Bayesian, Rule based and Random Forest Decision Tree. Moreover, two datasets have been used for testing the performance of classification accuracy, namely Wisconsin Breast Cancer and German Credit and these two datasets have Boolean target class. Results: The fourteen data representations were raised from two datasets Wisconsin Breast Cancer and German Credit with seven different data representations for each. These data representations were performed on several classifier techniques using Orange software. The results achieved showed variation of the performance among all classifier in classification accuracy. Excluding Naive Bayesian which had over 60 % different from the lowest to the highest accuracy, all other classifier techniques had diverging on classification accuracy around 4.2%.
研究几种分类器技术在表示变化时的分类精度性能
在构建预测数据挖掘模块的过程中,实现最高的准确性是所有研究人员关注的主要问题。研究数据表示对分类精度性能的影响至关重要。近年来的研究在各种分类器技术之间穿梭,寻找合适的、更高的分类精度来构建强模块。通过关注数据表示对数据挖掘预测技术分类精度的影响来增加额外的维度是本研究的最终目标。方法:在本研究中,对几种分类器技术进行了七种不同的数据表示。这些表示是AS_IS表示,三个来自二进制部分,三个来自规范化部分。二值化部分包括简单二值化、标志化和温度计化,归一化部分包括最小最大值归一化、s型归一化和标准差归一化。这七种表示分别应用于神经网络、逻辑回归、K近邻、支持向量机、分类树、朴素贝叶斯、基于规则和随机森林决策树等8种分类器上。此外,我们还使用了两个数据集来测试分类精度的性能,分别是Wisconsin Breast Cancer和German Credit,这两个数据集都有布尔目标类。结果:14个数据表示来自两个数据集威斯康星乳腺癌和德国信贷,每个数据集有7个不同的数据表示。使用Orange软件在几种分类器技术上执行这些数据表示。结果表明,不同分类器的分类精度存在差异。除了朴素贝叶斯(从最低准确率到最高准确率的差异超过60%),所有其他分类器技术的分类准确率在4.2%左右。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信