{"title":"Enhanced Adjacency-Constrained Hierarchical Clustering Using Fine-Grained Pseudo Labels","authors":"Jie Yang;Chin-Teng Lin","doi":"10.1109/TETCI.2024.3367811","DOIUrl":null,"url":null,"abstract":"Hierarchical clustering is able to provide partitions of different granularity levels. However, most existing hierarchical clustering techniques perform clustering in the original feature space of the data, which may suffer from overlap, sparseness, or other undesirable characteristics, resulting in noncompetitive performance. In the field of deep clustering, learning representations using pseudo labels has recently become a research hotspot. Yet most existing approaches employ coarse-grained pseudo labels, which may contain noise or incorrect labels. Hence, the learned feature space does not produce a competitive model. In this paper, we introduce the idea of fine-grained labels of supervised learning into unsupervised clustering, giving rise to the enhanced adjacency-constrained hierarchical clustering (ECHC) model. The full framework comprises four steps. One, adjacency-constrained hierarchical clustering (CHC) is used to produce relatively pure fine-grained pseudo labels. Two, those fine-grained pseudo labels are used to train a shallow multilayer perceptron to generate good representations. Three, the corresponding representation of each sample in the learned space is used to construct a similarity matrix. Four, CHC is used to generate the final partition based on the similarity matrix. The experimental results show that the proposed ECHC framework not only outperforms 14 shallow clustering methods on eight real-world datasets but also surpasses current state-of-the-art deep clustering models on six real-world datasets. In addition, on five real-world datasets, ECHC achieves comparable results to supervised algorithms.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 3","pages":"2481-2492"},"PeriodicalIF":5.3000,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10488478/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Hierarchical clustering is able to provide partitions of different granularity levels. However, most existing hierarchical clustering techniques perform clustering in the original feature space of the data, which may suffer from overlap, sparseness, or other undesirable characteristics, resulting in noncompetitive performance. In the field of deep clustering, learning representations using pseudo labels has recently become a research hotspot. Yet most existing approaches employ coarse-grained pseudo labels, which may contain noise or incorrect labels. Hence, the learned feature space does not produce a competitive model. In this paper, we introduce the idea of fine-grained labels of supervised learning into unsupervised clustering, giving rise to the enhanced adjacency-constrained hierarchical clustering (ECHC) model. The full framework comprises four steps. One, adjacency-constrained hierarchical clustering (CHC) is used to produce relatively pure fine-grained pseudo labels. Two, those fine-grained pseudo labels are used to train a shallow multilayer perceptron to generate good representations. Three, the corresponding representation of each sample in the learned space is used to construct a similarity matrix. Four, CHC is used to generate the final partition based on the similarity matrix. The experimental results show that the proposed ECHC framework not only outperforms 14 shallow clustering methods on eight real-world datasets but also surpasses current state-of-the-art deep clustering models on six real-world datasets. In addition, on five real-world datasets, ECHC achieves comparable results to supervised algorithms.
期刊介绍:
The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys.
TETCI is an electronics only publication. TETCI publishes six issues per year.
Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.