{"title":"最小方差和基于距离的k-means优化算法- rdk -means算法","authors":"Hao Wang, Mengyao Wu, Xiaoxia Lin, Jiashuai Zong","doi":"10.1109/ITOEC53115.2022.9734450","DOIUrl":null,"url":null,"abstract":"To address the problem that the traditional k-means randomly generates initialized centroids leading to the instability of the algorithm, Rdk-means (Range and density k-means) is proposed to improve the clustering algorithm. The algorithm introduces the concept of global density to select sample points. The centroids are initialized by selecting the sample points with the maximum density and distributed by a certain distance. At the same time, the concept of spatial location is introduced in the process of sample partitioning to filter out the centroids that are closer to each sample point, thus avoiding unnecessary centroid distance calculation and optimizing the shortage of longer clustering time caused by the initialization centroid process. Through experimental validation and analysis, the algorithm has been improved in clustering effect and stability.","PeriodicalId":127300,"journal":{"name":"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Minimum variance and range-based k-means optimization algorithm-Rdk-means algorithm\",\"authors\":\"Hao Wang, Mengyao Wu, Xiaoxia Lin, Jiashuai Zong\",\"doi\":\"10.1109/ITOEC53115.2022.9734450\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To address the problem that the traditional k-means randomly generates initialized centroids leading to the instability of the algorithm, Rdk-means (Range and density k-means) is proposed to improve the clustering algorithm. The algorithm introduces the concept of global density to select sample points. The centroids are initialized by selecting the sample points with the maximum density and distributed by a certain distance. At the same time, the concept of spatial location is introduced in the process of sample partitioning to filter out the centroids that are closer to each sample point, thus avoiding unnecessary centroid distance calculation and optimizing the shortage of longer clustering time caused by the initialization centroid process. Through experimental validation and analysis, the algorithm has been improved in clustering effect and stability.\",\"PeriodicalId\":127300,\"journal\":{\"name\":\"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITOEC53115.2022.9734450\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITOEC53115.2022.9734450","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
针对传统k-means随机生成初始化质心导致算法不稳定的问题,提出Rdk-means (Range and density k-means)对聚类算法进行改进。该算法引入了全局密度的概念来选择样本点。初始化质心的方法是选取密度最大的样本点,并按一定距离进行分布。同时,在样本划分过程中引入空间定位的概念,过滤出离每个样本点较近的质心,避免了不必要的质心距离计算,优化了初始化质心过程造成的较长聚类时间的不足。通过实验验证和分析,提高了算法的聚类效果和稳定性。
Minimum variance and range-based k-means optimization algorithm-Rdk-means algorithm
To address the problem that the traditional k-means randomly generates initialized centroids leading to the instability of the algorithm, Rdk-means (Range and density k-means) is proposed to improve the clustering algorithm. The algorithm introduces the concept of global density to select sample points. The centroids are initialized by selecting the sample points with the maximum density and distributed by a certain distance. At the same time, the concept of spatial location is introduced in the process of sample partitioning to filter out the centroids that are closer to each sample point, thus avoiding unnecessary centroid distance calculation and optimizing the shortage of longer clustering time caused by the initialization centroid process. Through experimental validation and analysis, the algorithm has been improved in clustering effect and stability.