全成对平方距离导致更平衡的聚类

Applied Computing and Intelligence Pub Date : 1900-01-01 DOI:10.3934/aci.2023006

Mikko I. Malinen, P. Fränti

{"title":"全成对平方距离导致更平衡的聚类","authors":"Mikko I. Malinen, P. Fränti","doi":"10.3934/aci.2023006","DOIUrl":null,"url":null,"abstract":"In clustering, the cost function that is commonly used involves calculating all-pairwise squared distances. In this paper, we formulate the cost function using mean squared error and show that this leads to more balanced clustering compared to centroid-based distance functions, like the sum of squared distances in $ k $-means. The clustering method has been formulated as a cut-based approach, more intuitively called Squared cut (Scut). We introduce an algorithm for the problem which is faster than the existing one based on the Stirling approximation. Our algorithm is a sequential variant of a local search algorithm. We show by experiments that the proposed approach provides better overall optimization of both mean squared error and cluster balance compared to existing methods.","PeriodicalId":414924,"journal":{"name":"Applied Computing and Intelligence","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"All-pairwise squared distances lead to more balanced clustering\",\"authors\":\"Mikko I. Malinen, P. Fränti\",\"doi\":\"10.3934/aci.2023006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In clustering, the cost function that is commonly used involves calculating all-pairwise squared distances. In this paper, we formulate the cost function using mean squared error and show that this leads to more balanced clustering compared to centroid-based distance functions, like the sum of squared distances in $ k $-means. The clustering method has been formulated as a cut-based approach, more intuitively called Squared cut (Scut). We introduce an algorithm for the problem which is faster than the existing one based on the Stirling approximation. Our algorithm is a sequential variant of a local search algorithm. We show by experiments that the proposed approach provides better overall optimization of both mean squared error and cluster balance compared to existing methods.\",\"PeriodicalId\":414924,\"journal\":{\"name\":\"Applied Computing and Intelligence\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Computing and Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3934/aci.2023006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Computing and Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3934/aci.2023006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在聚类中，通常使用的代价函数包括计算全成对的平方距离。在本文中，我们使用均方误差来制定成本函数，并表明与基于质心的距离函数(如k -means中的距离平方和)相比，这导致了更平衡的聚类。聚类方法已被制定为基于切割的方法，更直观地称为平方切割(Scut)。我们提出了一种比现有的基于斯特林近似的求解速度更快的算法。我们的算法是局部搜索算法的顺序变体。我们通过实验证明，与现有方法相比，所提出的方法在均方误差和簇平衡方面提供了更好的整体优化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

All-pairwise squared distances lead to more balanced clustering

In clustering, the cost function that is commonly used involves calculating all-pairwise squared distances. In this paper, we formulate the cost function using mean squared error and show that this leads to more balanced clustering compared to centroid-based distance functions, like the sum of squared distances in $ k $-means. The clustering method has been formulated as a cut-based approach, more intuitively called Squared cut (Scut). We introduce an algorithm for the problem which is faster than the existing one based on the Stirling approximation. Our algorithm is a sequential variant of a local search algorithm. We show by experiments that the proposed approach provides better overall optimization of both mean squared error and cluster balance compared to existing methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Computing and Intelligence

自引率

0.00%

发文量