Jing Zhang, Ruidong Fan, Hong Tao, Jiacheng Jiang, Chenping Hou
{"title":"Constrained clustering with weak label prior","authors":"Jing Zhang, Ruidong Fan, Hong Tao, Jiacheng Jiang, Chenping Hou","doi":"10.1007/s11704-023-3355-7","DOIUrl":null,"url":null,"abstract":"<p>Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.</p>","PeriodicalId":12640,"journal":{"name":"Frontiers of Computer Science","volume":"34 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2023-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers of Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11704-023-3355-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering is widely exploited in data mining. It has been proved that embedding weak label prior into clustering is effective to promote its performance. Previous researches mainly focus on only one type of prior. However, in many real scenarios, two kinds of weak label prior information, e.g., pairwise constraints and cluster ratio, are easily obtained or already available. How to incorporate them to improve clustering performance is important but rarely studied. We propose a novel constrained Clustering with Weak Label Prior method (CWLP), which is an integrated framework. Within the unified spectral clustering model, the pairwise constraints are employed as a regularizer in spectral embedding and label proportion is added as a constraint in spectral rotation. To approximate a variant of the embedding matrix more precisely, we replace a cluster indicator matrix with its scaled version. Instead of fixing an initial similarity matrix, we propose a new similarity matrix that is more suitable for deriving clustering results. Except for the theoretical convergence and computational complexity analyses, we validate the effectiveness of CWLP through several benchmark datasets, together with its ability to discriminate suspected breast cancer patients from healthy controls. The experimental evaluation illustrates the superiority of our proposed approach.
期刊介绍:
Frontiers of Computer Science aims to provide a forum for the publication of peer-reviewed papers to promote rapid communication and exchange between computer scientists. The journal publishes research papers and review articles in a wide range of topics, including: architecture, software, artificial intelligence, theoretical computer science, networks and communication, information systems, multimedia and graphics, information security, interdisciplinary, etc. The journal especially encourages papers from new emerging and multidisciplinary areas, as well as papers reflecting the international trends of research and development and on special topics reporting progress made by Chinese computer scientists.