Improving Classification Accuracy of Scikit-learn Classifiers with Discrete Fuzzy Interval Values

2020 International Conference on Computational Intelligence (ICCI) Pub Date : 2020-10-08 DOI:10.1109/ICCI51257.2020.9247696

Muhammad Nur Fikri Hishamuddin, M. Hassan, D. Tran, A. Mokhtar

{"title":"Improving Classification Accuracy of Scikit-learn Classifiers with Discrete Fuzzy Interval Values","authors":"Muhammad Nur Fikri Hishamuddin, M. Hassan, D. Tran, A. Mokhtar","doi":"10.1109/ICCI51257.2020.9247696","DOIUrl":null,"url":null,"abstract":"Understanding machine learning (ML) algorithm from scratch is time consuming. Thus, many software and library packages such as Weka and Scikit-Learn have been introduced to help researchers run simulation on several amounts of well-known classifiers. In ML, different classifiers have different performance and this depends on factor such as type of data used as input for the classification phase. Thus, it is necessary to perform data discretization when dealing with continuous data for classifiers that perform better with discrete data. However, in data mining, depending solely on discretization is not enough as real-world data can be large, imprecise and noisy. In addition, knowledge representation is necessary to help researchers to understand better about the data during the discretization process. Thus, the objective of this study is to observe the effect of fuzzy elements inside the discretization phase on the classification accuracy of Scikit-learn classifiers. In this study, fuzzy logic has been proposed to assist the existing discretization technique through fuzzy membership graph, linguistic variables and discrete interval values. All classifiers in Scikit-learn packages were used during the classification phase through 10-fold cross validation. The simulation results showed that the presence of fuzzy in assisting the discretization process slightly improved the classification accuracy of ensemble type classifiers such as Random Forest and Naive Bayes while slightly degrading the performance of other classifiers.","PeriodicalId":194158,"journal":{"name":"2020 International Conference on Computational Intelligence (ICCI)","volume":"80 16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computational Intelligence (ICCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI51257.2020.9247696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Understanding machine learning (ML) algorithm from scratch is time consuming. Thus, many software and library packages such as Weka and Scikit-Learn have been introduced to help researchers run simulation on several amounts of well-known classifiers. In ML, different classifiers have different performance and this depends on factor such as type of data used as input for the classification phase. Thus, it is necessary to perform data discretization when dealing with continuous data for classifiers that perform better with discrete data. However, in data mining, depending solely on discretization is not enough as real-world data can be large, imprecise and noisy. In addition, knowledge representation is necessary to help researchers to understand better about the data during the discretization process. Thus, the objective of this study is to observe the effect of fuzzy elements inside the discretization phase on the classification accuracy of Scikit-learn classifiers. In this study, fuzzy logic has been proposed to assist the existing discretization technique through fuzzy membership graph, linguistic variables and discrete interval values. All classifiers in Scikit-learn packages were used during the classification phase through 10-fold cross validation. The simulation results showed that the presence of fuzzy in assisting the discretization process slightly improved the classification accuracy of ensemble type classifiers such as Random Forest and Naive Bayes while slightly degrading the performance of other classifiers.

查看原文本刊更多论文

提高离散模糊区间值Scikit-learn分类器的分类精度

了解机器学习(ML)从头算法耗时。因此，已经引入了许多软件和库包，如Weka和Scikit-Learn，以帮助研究人员在多个知名分类器上运行模拟。在ML中，不同的分类器具有不同的性能，这取决于分类阶段使用的输入数据类型等因素。因此，在处理连续数据时，为了使分类器在处理离散数据时表现更好，有必要执行数据离散化。然而，在数据挖掘中，仅仅依赖于离散化是不够的，因为现实世界的数据可能是庞大的、不精确的和有噪声的。此外,知识表示是必要的帮助研究人员更好地理解数据离散化过程中。因此，本研究的目的是观察离散化阶段内模糊元素对Scikit-learn分类器分类精度的影响。在本研究中，我们提出模糊逻辑通过模糊隶属图、语言变量和离散区间值来辅助现有的离散化技术。通过10倍交叉验证，在分类阶段使用Scikit-learn包中的所有分类器。仿真结果表明,模糊的存在帮助离散化过程中略有改善整体类型分类器的分类精度,如随机森林和朴素贝叶斯而稍微有辱人格的其他分类器的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 International Conference on Computational Intelligence (ICCI)

自引率

0.00%

发文量