Sarah Alma P. Bentir, A. Ballado, Merl James P. Macawile
{"title":"Feature Relevancy Evaluation Based on Entropy Information","authors":"Sarah Alma P. Bentir, A. Ballado, Merl James P. Macawile","doi":"10.1109/HNICEM.2018.8666381","DOIUrl":null,"url":null,"abstract":"with the huge data that is available from a simple raw dataset, the relevancy of features became important part in data mining. However, most comparison of classification result evaluates only the classification performance while compromising the quality of the attributes. Hence, this paper focused on the entropy evaluation based on bit per instance and compared using the two feature selection methods such as filter and wrapper, namely: InfoGain and wrapper Subset Evaluation using J48 algorithm. Evaluated features are both encoded to an expert machine classifier to perform the evaluation of the attributes that output rule. The overall accuracy in terms of classifiers performance gained by filter method is 97.9752% while Wrapper method gained 98.0422%. The result of Log-loss prior to the probabilities with respect to its entropy has both produced 0.2488 bits/instance of filter with 28 numeric attributes and wrapper method with 8 numeric attributes. On the other hand, the results of log-loss that shows the class complexity on scheme using Wrapper method has produced better result with 1.3381 bits/instance than Filter method with 6.2335 bits/instance. Lastly, Log loss information in this study produced sufficient information not only in the classifiers performance with respect to each classes but also to the model produced.","PeriodicalId":426103,"journal":{"name":"2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HNICEM.2018.8666381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
with the huge data that is available from a simple raw dataset, the relevancy of features became important part in data mining. However, most comparison of classification result evaluates only the classification performance while compromising the quality of the attributes. Hence, this paper focused on the entropy evaluation based on bit per instance and compared using the two feature selection methods such as filter and wrapper, namely: InfoGain and wrapper Subset Evaluation using J48 algorithm. Evaluated features are both encoded to an expert machine classifier to perform the evaluation of the attributes that output rule. The overall accuracy in terms of classifiers performance gained by filter method is 97.9752% while Wrapper method gained 98.0422%. The result of Log-loss prior to the probabilities with respect to its entropy has both produced 0.2488 bits/instance of filter with 28 numeric attributes and wrapper method with 8 numeric attributes. On the other hand, the results of log-loss that shows the class complexity on scheme using Wrapper method has produced better result with 1.3381 bits/instance than Filter method with 6.2335 bits/instance. Lastly, Log loss information in this study produced sufficient information not only in the classifiers performance with respect to each classes but also to the model produced.