KFCM-PSOTD : An Imputation Technique for Missing Values in Incomplete Data Classification

CAUCHY: Jurnal Matematika Murni dan Aplikasi Pub Date : 2024-05-16 DOI:10.18860/ca.v9i1.25138

Muhaimin Ilyas, Syaiful Anam, Trisilowati Trisilowati

{"title":"KFCM-PSOTD : An Imputation Technique for Missing Values in Incomplete Data Classification","authors":"Muhaimin Ilyas, Syaiful Anam, Trisilowati Trisilowati","doi":"10.18860/ca.v9i1.25138","DOIUrl":null,"url":null,"abstract":"Data mining is a very important process for finding out the data interpretation. Data preprocessing is the crucial data mining steps. The existence of missing values in the data is one of the primary issues with data preprocessing. Generally, this can be overcome with mean or median imputation because they are easy to implement. However, the use of these techniques is not recommended because they ignore the data variance. This research develops the Kernel Fuzzy C-Means Optimized by the Particle Swarm Optimizer with Two Differential Mutations (KFCM-PSOTD). KFCM imputation is applied to obtain better estimation values due to its proven ability to recognize patterns in the data. In addition, the PSOTD algorithm is used as an optimization tool to boost the KFCM's performance. PSOTD is adopted because it has more balanced exploration and exploitation capabilities compared to classical PSO. Datasets that have been imputed on KFCM-PSOTD are classified using the Decision Tree algorithm. The results are evaluated using accuracy, precision, recall, and f1 score to determine the quality of the imputed values. The outcomes demonstrate that the KFCM-PSOTD algorithm has a better performance; even the difference in evaluation scores obtained reaches 10% better than other imputation techniques. ","PeriodicalId":388519,"journal":{"name":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","volume":"9 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18860/ca.v9i1.25138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Data mining is a very important process for finding out the data interpretation. Data preprocessing is the crucial data mining steps. The existence of missing values in the data is one of the primary issues with data preprocessing. Generally, this can be overcome with mean or median imputation because they are easy to implement. However, the use of these techniques is not recommended because they ignore the data variance. This research develops the Kernel Fuzzy C-Means Optimized by the Particle Swarm Optimizer with Two Differential Mutations (KFCM-PSOTD). KFCM imputation is applied to obtain better estimation values due to its proven ability to recognize patterns in the data. In addition, the PSOTD algorithm is used as an optimization tool to boost the KFCM's performance. PSOTD is adopted because it has more balanced exploration and exploitation capabilities compared to classical PSO. Datasets that have been imputed on KFCM-PSOTD are classified using the Decision Tree algorithm. The results are evaluated using accuracy, precision, recall, and f1 score to determine the quality of the imputed values. The outcomes demonstrate that the KFCM-PSOTD algorithm has a better performance; even the difference in evaluation scores obtained reaches 10% better than other imputation techniques.

查看原文本刊更多论文

KFCM-PSOTD：不完整数据分类中缺失值的估算技术

数据挖掘是一个非常重要的数据解读过程。数据预处理是数据挖掘的关键步骤。数据中存在缺失值是数据预处理的主要问题之一。一般来说，这可以通过均值或中位数估算来解决，因为它们很容易实现。然而，我们并不推荐使用这些技术，因为它们忽略了数据的方差。本研究开发了由粒子群优化器优化的核模糊 C-Means 算法（KFCM-PSOTD）。由于 KFCM 已被证明具有识别数据模式的能力，因此采用 KFCM 估算可获得更好的估算值。此外，PSOTD 算法被用作优化工具，以提高 KFCM 的性能。之所以采用 PSOTD 算法，是因为与传统的 PSO 相比，它具有更均衡的探索和利用能力。在 KFCM-PSOTD 上归类的数据集使用决策树算法进行分类。使用准确度、精确度、召回率和 f1 分数对结果进行评估，以确定估算值的质量。结果表明，KFCM-PSOTD 算法具有更好的性能；与其他估算技术相比，其获得的评估分数差异甚至达到了 10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

CAUCHY: Jurnal Matematika Murni dan Aplikasi

自引率

0.00%

发文量