基于均衡优化算法的大数据k -均值聚类增强

IF 2 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Systems Pub Date : 2023-01-01 DOI:10.1515/jisys-2022-0230

Sarah Ghanim Mahmood Al-kababchee, Z. Algamal, O. Qasim

{"title":"基于均衡优化算法的大数据k -均值聚类增强","authors":"Sarah Ghanim Mahmood Al-kababchee, Z. Algamal, O. Qasim","doi":"10.1515/jisys-2022-0230","DOIUrl":null,"url":null,"abstract":"Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K-means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K-means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.","PeriodicalId":46139,"journal":{"name":"Journal of Intelligent Systems","volume":"56 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm\",\"authors\":\"Sarah Ghanim Mahmood Al-kababchee, Z. Algamal, O. Qasim\",\"doi\":\"10.1515/jisys-2022-0230\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K-means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K-means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.\",\"PeriodicalId\":46139,\"journal\":{\"name\":\"Journal of Intelligent Systems\",\"volume\":\"56 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Intelligent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/jisys-2022-0230\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/jisys-2022-0230","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

摘要

数据挖掘的主要聚类方法有多种用途，包括基因分析。在聚类研究中，利用数据特征将一组未标记的数据分成簇，这是一个无监督学习问题。一个集群中的数据彼此之间的可比性比其他组中的数据更强。然而，聚类的数量对K-means算法的性能有直接影响。为了找到这些现实世界优化问题的最佳解决方案，有必要使用适当探索搜索空间的技术。本文提出了一种基于均衡优化的K-means聚类算法。建议的方法在选择最佳属性的同时调整簇的数量以找到最优答案。研究结果表明，在基于五个数据集的簇内距离和Rand指数方面，与现有算法相比，所建议的方法是有用的。通过所示的结果以及与其他传统方法的比较，发现该方法在同一聚类内元素的内部维度以及Rand指数方面都更好。总之，建议的技术可以成功地用于数据聚类，并可以提供重要的支持。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enhancement of K-means clustering in big data based on equilibrium optimizer algorithm

Abstract Data mining’s primary clustering method has several uses, including gene analysis. A set of unlabeled data is divided into clusters using data features in a clustering study, which is an unsupervised learning problem. Data in a cluster are more comparable to one another than to those in other groups. However, the number of clusters has a direct impact on how well the K-means algorithm performs. In order to find the best solutions for these real-world optimization issues, it is necessary to use techniques that properly explore the search spaces. In this research, an enhancement of K-means clustering is proposed by applying an equilibrium optimization approach. The suggested approach adjusts the number of clusters while simultaneously choosing the best attributes to find the optimal answer. The findings establish the usefulness of the suggested method in comparison to existing algorithms in terms of intra-cluster distances and Rand index based on five datasets. Through the results shown and a comparison of the proposed method with the rest of the traditional methods, it was found that the proposal is better in terms of the internal dimension of the elements within the same cluster, as well as the Rand index. In conclusion, the suggested technique can be successfully employed for data clustering and can offer significant support.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Intelligent Systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

5.90

自引率

3.30%

发文量

审稿时长

51 weeks

期刊介绍： The Journal of Intelligent Systems aims to provide research and review papers, as well as Brief Communications at an interdisciplinary level, with the field of intelligent systems providing the focal point. This field includes areas like artificial intelligence, models and computational theories of human cognition, perception and motivation; brain models, artificial neural nets and neural computing. It covers contributions from the social, human and computer sciences to the analysis and application of information technology.