KFCM-PSOTD : An Imputation Technique for Missing Values in Incomplete Data Classification

Muhaimin Ilyas, Syaiful Anam, Trisilowati Trisilowati
{"title":"KFCM-PSOTD : An Imputation Technique for Missing Values in Incomplete Data Classification","authors":"Muhaimin Ilyas, Syaiful Anam, Trisilowati Trisilowati","doi":"10.18860/ca.v9i1.25138","DOIUrl":null,"url":null,"abstract":"Data mining is a very important process for finding out the data interpretation. Data preprocessing is the crucial data mining steps. The existence of missing values in the data is one of the primary issues with data preprocessing. Generally, this can be overcome with mean or median imputation because they are easy to implement. However, the use of these techniques is not recommended because they ignore the data variance. This research develops the Kernel Fuzzy C-Means Optimized by the Particle Swarm Optimizer with Two Differential Mutations (KFCM-PSOTD).  KFCM imputation is applied to obtain better estimation values due to its proven ability to recognize patterns in the data. In addition, the PSOTD algorithm is used as an optimization tool to boost the KFCM's performance. PSOTD is adopted because it has more balanced exploration and exploitation capabilities compared to classical PSO. Datasets that have been imputed on KFCM-PSOTD are classified using the Decision Tree algorithm. The results are evaluated using accuracy, precision, recall, and f1 score to determine the quality of the imputed values. The outcomes demonstrate that the KFCM-PSOTD algorithm has a better performance; even the difference in evaluation scores obtained reaches 10% better than other imputation techniques. ","PeriodicalId":388519,"journal":{"name":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","volume":"9 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAUCHY: Jurnal Matematika Murni dan Aplikasi","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18860/ca.v9i1.25138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Data mining is a very important process for finding out the data interpretation. Data preprocessing is the crucial data mining steps. The existence of missing values in the data is one of the primary issues with data preprocessing. Generally, this can be overcome with mean or median imputation because they are easy to implement. However, the use of these techniques is not recommended because they ignore the data variance. This research develops the Kernel Fuzzy C-Means Optimized by the Particle Swarm Optimizer with Two Differential Mutations (KFCM-PSOTD).  KFCM imputation is applied to obtain better estimation values due to its proven ability to recognize patterns in the data. In addition, the PSOTD algorithm is used as an optimization tool to boost the KFCM's performance. PSOTD is adopted because it has more balanced exploration and exploitation capabilities compared to classical PSO. Datasets that have been imputed on KFCM-PSOTD are classified using the Decision Tree algorithm. The results are evaluated using accuracy, precision, recall, and f1 score to determine the quality of the imputed values. The outcomes demonstrate that the KFCM-PSOTD algorithm has a better performance; even the difference in evaluation scores obtained reaches 10% better than other imputation techniques. 
KFCM-PSOTD:不完整数据分类中缺失值的估算技术
数据挖掘是一个非常重要的数据解读过程。数据预处理是数据挖掘的关键步骤。数据中存在缺失值是数据预处理的主要问题之一。一般来说,这可以通过均值或中位数估算来解决,因为它们很容易实现。然而,我们并不推荐使用这些技术,因为它们忽略了数据的方差。本研究开发了由粒子群优化器优化的核模糊 C-Means 算法(KFCM-PSOTD)。 由于 KFCM 已被证明具有识别数据模式的能力,因此采用 KFCM 估算可获得更好的估算值。此外,PSOTD 算法被用作优化工具,以提高 KFCM 的性能。之所以采用 PSOTD 算法,是因为与传统的 PSO 相比,它具有更均衡的探索和利用能力。在 KFCM-PSOTD 上归类的数据集使用决策树算法进行分类。使用准确度、精确度、召回率和 f1 分数对结果进行评估,以确定估算值的质量。结果表明,KFCM-PSOTD 算法具有更好的性能;与其他估算技术相比,其获得的评估分数差异甚至达到了 10%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信