Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm

IF 2.1 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Sarah Ghanim Mahmood Al-kababchee, Z. Algamal, O. Qasim
{"title":"Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm","authors":"Sarah Ghanim Mahmood Al-kababchee, Z. Algamal, O. Qasim","doi":"10.1515/jisys-2022-0230","DOIUrl":null,"url":null,"abstract":"Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K-means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K-means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.","PeriodicalId":46139,"journal":{"name":"Journal of Intelligent Systems","volume":"56 1","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jisys-2022-0230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K-means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K-means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.
基于均衡优化算法的大数据k -均值聚类增强
数据挖掘的主要聚类方法有多种用途,包括基因分析。在聚类研究中,利用数据特征将一组未标记的数据分成簇,这是一个无监督学习问题。一个集群中的数据彼此之间的可比性比其他组中的数据更强。然而,聚类的数量对K-means算法的性能有直接影响。为了找到这些现实世界优化问题的最佳解决方案,有必要使用适当探索搜索空间的技术。本文提出了一种基于均衡优化的K-means聚类算法。建议的方法在选择最佳属性的同时调整簇的数量以找到最优答案。研究结果表明,在基于五个数据集的簇内距离和Rand指数方面,与现有算法相比,所建议的方法是有用的。通过所示的结果以及与其他传统方法的比较,发现该方法在同一聚类内元素的内部维度以及Rand指数方面都更好。总之,建议的技术可以成功地用于数据聚类,并可以提供重要的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Intelligent Systems
Journal of Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
5.90
自引率
3.30%
发文量
77
审稿时长
51 weeks
期刊介绍: The Journal of Intelligent Systems aims to provide research and review papers, as well as Brief Communications at an interdisciplinary level, with the field of intelligent systems providing the focal point. This field includes areas like artificial intelligence, models and computational theories of human cognition, perception and motivation; brain models, artificial neural nets and neural computing. It covers contributions from the social, human and computer sciences to the analysis and application of information technology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信