Constrained clustering with weak label prior

IF 4.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Frontiers of Computer Science Pub Date : 2023-12-13 DOI:10.1007/s11704-023-3355-7

Jing Zhang, Ruidong Fan, Hong Tao, Jiacheng Jiang, Chenping Hou

{"title":"Constrained clustering with weak label prior","authors":"Jing Zhang, Ruidong Fan, Hong Tao, Jiacheng Jiang, Chenping Hou","doi":"10.1007/s11704-023-3355-7","DOIUrl":null,"url":null,"abstract":"<p>Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"34 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-3355-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.

查看原文本刊更多论文

弱标签先验的受限聚类

聚类在数据挖掘中被广泛应用。实践证明，在聚类中嵌入弱标签先验可以有效提高聚类性能。以往的研究主要只关注一种先验信息。然而，在许多实际场景中，有两种弱标签先验信息，如成对约束和聚类比率，是很容易获得或已经存在的。如何结合它们来提高聚类性能非常重要，但却很少有人研究。我们提出了一种新颖的弱标签先验约束聚类方法（CWLP），它是一个集成框架。在统一的光谱聚类模型中，配对约束被用作光谱嵌入的正则化器，标签比例被添加为光谱旋转的约束。为了更精确地近似嵌入矩阵的变体，我们用其缩放版本取代了聚类指标矩阵。我们没有固定初始相似性矩阵，而是提出了一种更适合得出聚类结果的新相似性矩阵。除了理论收敛性和计算复杂性分析外，我们还通过几个基准数据集验证了 CWLP 的有效性，以及它区分疑似乳腺癌患者和健康对照组的能力。实验评估证明了我们提出的方法的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers of Computer Science COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

8.60

自引率

2.40%

发文量

799

审稿时长

6-12 weeks

期刊介绍： Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.