{"title":"Clustering image data with a fixed embedding","authors":"Yan-Bin Chen, Khong-Loon Tiong, Chen-Hsiang Yeang","doi":"10.1109/ICMLA55696.2022.00148","DOIUrl":null,"url":null,"abstract":"Clustering unlabeled image data using deep neural network (DNN) models is under active investigation. Most existing approaches transform the data through embedding operations and cluster the embedded data, and the embedding is learned to fit the data. In some applications, the embedding model is explicitly given due to the concerns of generalizability, transferability, privacy and security. Despite rapid progress in self-supervised learning, clustering data with a fixed embedding is rarely explored. We propose an Merge & Expand (ME) algorithm to cluster image data using a fixed embedding and a DNN classification model. ME achieves a comparable level of accuracy with some state-of-the-art algorithms. It further demarcates the \"clean\" and \"unclean\" images where their geometric relations in the embedded space are compatible and incompatible with their cluster structure respectively. Finally, we validate ME with three datasets and discuss its potential extension.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"277 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering unlabeled image data using deep neural network (DNN) models is under active investigation. Most existing approaches transform the data through embedding operations and cluster the embedded data, and the embedding is learned to fit the data. In some applications, the embedding model is explicitly given due to the concerns of generalizability, transferability, privacy and security. Despite rapid progress in self-supervised learning, clustering data with a fixed embedding is rarely explored. We propose an Merge & Expand (ME) algorithm to cluster image data using a fixed embedding and a DNN classification model. ME achieves a comparable level of accuracy with some state-of-the-art algorithms. It further demarcates the "clean" and "unclean" images where their geometric relations in the embedded space are compatible and incompatible with their cluster structure respectively. Finally, we validate ME with three datasets and discuss its potential extension.