{"title":"稀疏多核k $$ k $$ -均值聚类的新公式及其应用","authors":"Wentao Qu, Xianchao Xiu, Jun Sun, Lingchen Kong","doi":"10.1002/sam.11621","DOIUrl":null,"url":null,"abstract":"Multiple kernel k$$ k $$ ‐means (MKKM) clustering has been an important research topic in statistical machine learning and data mining over the last few decades. MKKM combines a group of prespecified base kernels to improve the clustering performance. Although many efforts have been made to improve the performance of MKKM further, the present works do not sufficiently consider the potential structure of the partition matrix. In this paper, we propose a novel sparse multiple kernel k$$ k $$ ‐means (SMKKM) clustering by introducing a ℓ1$$ {\\ell}_1 $$ ‐norm to induce the sparsity of the partition matrix. We then design an efficient alternating algorithm with curve search technology. More importantly, the convergence and complexity analysis of the designed algorithm are established based on the optimality conditions of the SMKKM. Finally, extensive numerical experiments on synthetic and benchmark datasets demonstrate that the proposed method outperforms the state‐of‐the‐art methods in terms of clustering performance and robustness.","PeriodicalId":342679,"journal":{"name":"Statistical Analysis and Data Mining: The ASA Data Science Journal","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new formulation of sparse multiple kernel k$$ k $$ ‐means clustering and its applications\",\"authors\":\"Wentao Qu, Xianchao Xiu, Jun Sun, Lingchen Kong\",\"doi\":\"10.1002/sam.11621\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiple kernel k$$ k $$ ‐means (MKKM) clustering has been an important research topic in statistical machine learning and data mining over the last few decades. MKKM combines a group of prespecified base kernels to improve the clustering performance. Although many efforts have been made to improve the performance of MKKM further, the present works do not sufficiently consider the potential structure of the partition matrix. In this paper, we propose a novel sparse multiple kernel k$$ k $$ ‐means (SMKKM) clustering by introducing a ℓ1$$ {\\\\ell}_1 $$ ‐norm to induce the sparsity of the partition matrix. We then design an efficient alternating algorithm with curve search technology. More importantly, the convergence and complexity analysis of the designed algorithm are established based on the optimality conditions of the SMKKM. Finally, extensive numerical experiments on synthetic and benchmark datasets demonstrate that the proposed method outperforms the state‐of‐the‐art methods in terms of clustering performance and robustness.\",\"PeriodicalId\":342679,\"journal\":{\"name\":\"Statistical Analysis and Data Mining: The ASA Data Science Journal\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical Analysis and Data Mining: The ASA Data Science Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/sam.11621\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Analysis and Data Mining: The ASA Data Science Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/sam.11621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在过去的几十年里,多核k $$ k $$‐means (MKKM)聚类一直是统计机器学习和数据挖掘领域的一个重要研究课题。MKKM结合了一组预先指定的基本内核来提高集群性能。尽管已经做出了许多努力来进一步提高MKKM的性能,但目前的工作没有充分考虑划分矩阵的潜在结构。在本文中,我们提出了一种新的稀疏多核k $$ k $$‐means (SMKKM)聚类方法,通过引入1 $$ {\ell}_1 $$‐范数来诱导划分矩阵的稀疏性。然后利用曲线搜索技术设计了一种高效的交替算法。更重要的是,基于SMKKM的最优性条件,建立了所设计算法的收敛性和复杂度分析。最后,在合成数据集和基准数据集上进行的大量数值实验表明,所提出的方法在聚类性能和鲁棒性方面优于最先进的方法。
A new formulation of sparse multiple kernel k$$ k $$ ‐means clustering and its applications
Multiple kernel k$$ k $$ ‐means (MKKM) clustering has been an important research topic in statistical machine learning and data mining over the last few decades. MKKM combines a group of prespecified base kernels to improve the clustering performance. Although many efforts have been made to improve the performance of MKKM further, the present works do not sufficiently consider the potential structure of the partition matrix. In this paper, we propose a novel sparse multiple kernel k$$ k $$ ‐means (SMKKM) clustering by introducing a ℓ1$$ {\ell}_1 $$ ‐norm to induce the sparsity of the partition matrix. We then design an efficient alternating algorithm with curve search technology. More importantly, the convergence and complexity analysis of the designed algorithm are established based on the optimality conditions of the SMKKM. Finally, extensive numerical experiments on synthetic and benchmark datasets demonstrate that the proposed method outperforms the state‐of‐the‐art methods in terms of clustering performance and robustness.