基于多因素的改进K-means聚类算法

Tianqi Lei, Shuqin Li
{"title":"基于多因素的改进K-means聚类算法","authors":"Tianqi Lei, Shuqin Li","doi":"10.1109/CTISC52352.2021.00054","DOIUrl":null,"url":null,"abstract":"The traditional K-means clustering algorithm has the problems that the number of clusters needs to be determined artificially, the clustering results are easily affected by the initial clustering centers and isolated points, and the iterative process is computationally complicated. To address above problems, an improved K-means clustering algorithm combining multi-point optimization(MFK-means) is proposed. The proposed algorithm is improved from the following four aspects. Firstly, the number of clusters is jointly determined by combining the contour coefficient method and the elbow rule. Secondly, the optimized outlier detection algorithm is used to exclude the influence of isolated points on the clustering results. Thirdly, the initial clustering centers are gradually determined based on the outlier candidate set and the maximum-minimum distance idea. Finally, the heuristic method is used to reduce the computational effort of the iterative process. The experimental results on the UCI dataset show that the proposed improved algorithm has higher clustering accuracy and better stability than the traditional K-means algorithm and another improved algorithm.","PeriodicalId":268378,"journal":{"name":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improved K-means Clustering Algorithm by Combining with Multiple Factors\",\"authors\":\"Tianqi Lei, Shuqin Li\",\"doi\":\"10.1109/CTISC52352.2021.00054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The traditional K-means clustering algorithm has the problems that the number of clusters needs to be determined artificially, the clustering results are easily affected by the initial clustering centers and isolated points, and the iterative process is computationally complicated. To address above problems, an improved K-means clustering algorithm combining multi-point optimization(MFK-means) is proposed. The proposed algorithm is improved from the following four aspects. Firstly, the number of clusters is jointly determined by combining the contour coefficient method and the elbow rule. Secondly, the optimized outlier detection algorithm is used to exclude the influence of isolated points on the clustering results. Thirdly, the initial clustering centers are gradually determined based on the outlier candidate set and the maximum-minimum distance idea. Finally, the heuristic method is used to reduce the computational effort of the iterative process. The experimental results on the UCI dataset show that the proposed improved algorithm has higher clustering accuracy and better stability than the traditional K-means algorithm and another improved algorithm.\",\"PeriodicalId\":268378,\"journal\":{\"name\":\"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CTISC52352.2021.00054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Advances in Computer Technology, Information Science and Communication (CTISC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CTISC52352.2021.00054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

传统的K-means聚类算法存在需要人工确定聚类数量、聚类结果容易受到初始聚类中心和孤立点的影响、迭代过程计算复杂等问题。针对上述问题,提出了一种改进的结合多点优化(MFK-means)的K-means聚类算法。该算法从以下四个方面进行了改进。首先,结合轮廓系数法和弯头规则共同确定聚类数量;其次,利用优化后的离群点检测算法排除孤立点对聚类结果的影响;第三,基于离群点候选集和最大最小距离思想,逐步确定初始聚类中心;最后,采用启发式方法减少了迭代过程的计算量。在UCI数据集上的实验结果表明,与传统K-means算法和另一种改进算法相比,本文提出的改进算法具有更高的聚类精度和更好的稳定性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improved K-means Clustering Algorithm by Combining with Multiple Factors
The traditional K-means clustering algorithm has the problems that the number of clusters needs to be determined artificially, the clustering results are easily affected by the initial clustering centers and isolated points, and the iterative process is computationally complicated. To address above problems, an improved K-means clustering algorithm combining multi-point optimization(MFK-means) is proposed. The proposed algorithm is improved from the following four aspects. Firstly, the number of clusters is jointly determined by combining the contour coefficient method and the elbow rule. Secondly, the optimized outlier detection algorithm is used to exclude the influence of isolated points on the clustering results. Thirdly, the initial clustering centers are gradually determined based on the outlier candidate set and the maximum-minimum distance idea. Finally, the heuristic method is used to reduce the computational effort of the iterative process. The experimental results on the UCI dataset show that the proposed improved algorithm has higher clustering accuracy and better stability than the traditional K-means algorithm and another improved algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信