Feature Relevancy Evaluation Based on Entropy Information

2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM) Pub Date : 2018-11-01 DOI:10.1109/HNICEM.2018.8666381

Sarah Alma P. Bentir, A. Ballado, Merl James P. Macawile

{"title":"Feature Relevancy Evaluation Based on Entropy Information","authors":"Sarah Alma P. Bentir, A. Ballado, Merl James P. Macawile","doi":"10.1109/HNICEM.2018.8666381","DOIUrl":null,"url":null,"abstract":"with the huge data that is available from a simple raw dataset, the relevancy of features became important part in data mining. However, most comparison of classification result evaluates only the classification performance while compromising the quality of the attributes. Hence, this paper focused on the entropy evaluation based on bit per instance and compared using the two feature selection methods such as filter and wrapper, namely: InfoGain and wrapper Subset Evaluation using J48 algorithm. Evaluated features are both encoded to an expert machine classifier to perform the evaluation of the attributes that output rule. The overall accuracy in terms of classifiers performance gained by filter method is 97.9752% while Wrapper method gained 98.0422%. The result of Log-loss prior to the probabilities with respect to its entropy has both produced 0.2488 bits/instance of filter with 28 numeric attributes and wrapper method with 8 numeric attributes. On the other hand, the results of log-loss that shows the class complexity on scheme using Wrapper method has produced better result with 1.3381 bits/instance than Filter method with 6.2335 bits/instance. Lastly, Log loss information in this study produced sufficient information not only in the classifiers performance with respect to each classes but also to the model produced.","PeriodicalId":426103,"journal":{"name":"2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HNICEM.2018.8666381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

with the huge data that is available from a simple raw dataset, the relevancy of features became important part in data mining. However, most comparison of classification result evaluates only the classification performance while compromising the quality of the attributes. Hence, this paper focused on the entropy evaluation based on bit per instance and compared using the two feature selection methods such as filter and wrapper, namely: InfoGain and wrapper Subset Evaluation using J48 algorithm. Evaluated features are both encoded to an expert machine classifier to perform the evaluation of the attributes that output rule. The overall accuracy in terms of classifiers performance gained by filter method is 97.9752% while Wrapper method gained 98.0422%. The result of Log-loss prior to the probabilities with respect to its entropy has both produced 0.2488 bits/instance of filter with 28 numeric attributes and wrapper method with 8 numeric attributes. On the other hand, the results of log-loss that shows the class complexity on scheme using Wrapper method has produced better result with 1.3381 bits/instance than Filter method with 6.2335 bits/instance. Lastly, Log loss information in this study produced sufficient information not only in the classifiers performance with respect to each classes but also to the model produced.

查看原文本刊更多论文

基于熵信息的特征相关性评价

由于简单的原始数据集可以提供大量的数据，特征的相关性成为数据挖掘的重要组成部分。然而，大多数分类结果的比较只评估分类性能，而牺牲了属性的质量。因此，本文重点研究了基于每实例比特的熵评估，并使用过滤器和包装器两种特征选择方法进行了比较，即使用J48算法的InfoGain和包装器子集评估。被评估的特征都被编码到专家机器分类器中，以执行输出规则的属性的评估。filter方法获得的分类器性能总体准确率为97.9752%，Wrapper方法获得的分类器性能总体准确率为98.0422%。相对于其熵的概率，Log-loss的结果都产生了0.2488比特/实例的具有28个数字属性的滤波器和具有8个数字属性的包装方法。另一方面，使用Wrapper方法的log-loss结果显示了方案上的类复杂度，其结果为1.3381 bits/instance，优于Filter方法的6.2335 bits/instance。最后，本研究中的日志损失信息不仅在分类器的性能中提供了足够的信息，而且在生成的模型中也提供了足够的信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM)

自引率

0.00%

发文量