{"title":"SDCG: Silhouette-based Deep Clustering with GNN for Improved Graph Node Clustering","authors":"Hyesoo Shin, Eunjo Jang, Sojeong Kim, Ki Yong Lee","doi":"10.1109/SERA57763.2023.10197683","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) are powerful tools for analyzing graph-structured data in various fields because of their great expressive power for graph data. They use a message-passing mechanism to update node embeddings, which are then used for tasks such as node classification and link prediction. Recently, node embeddings have also been used in research on graph node clustering, which aims to group similar nodes based on their features and graph topology. However, traditional methods for node clustering have a limitation in that GNNs only focus on generating node embeddings without considering the ultimate objective of clustering. To address this issue, a novel technique called \"Deep Clustering\" has been proposed, which integrates both node embedding and clustering stages. This requires defining a new loss function by simultaneously minimizing the GNN loss and the clustering loss. Our proposed loss function incorporates not only the distance within clusters but also the distance between clusters by applying the Silhouette coefficient, which enables us to achieve better clustering results. In this paper, we propose a Silhouette-based Deep Clustering with GNN (SDCG) to more effectively cluster nodes in a graph by iteratively training the embedding model to produce embedding vectors with improved clustering results. Through extensive experiments, we demonstrate that SDCG outperforms the conventional approach of performing embedding and clustering independently.","PeriodicalId":211080,"journal":{"name":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACIS 21st International Conference on Software Engineering Research, Management and Applications (SERA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERA57763.2023.10197683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Graph Neural Networks (GNNs) are powerful tools for analyzing graph-structured data in various fields because of their great expressive power for graph data. They use a message-passing mechanism to update node embeddings, which are then used for tasks such as node classification and link prediction. Recently, node embeddings have also been used in research on graph node clustering, which aims to group similar nodes based on their features and graph topology. However, traditional methods for node clustering have a limitation in that GNNs only focus on generating node embeddings without considering the ultimate objective of clustering. To address this issue, a novel technique called "Deep Clustering" has been proposed, which integrates both node embedding and clustering stages. This requires defining a new loss function by simultaneously minimizing the GNN loss and the clustering loss. Our proposed loss function incorporates not only the distance within clusters but also the distance between clusters by applying the Silhouette coefficient, which enables us to achieve better clustering results. In this paper, we propose a Silhouette-based Deep Clustering with GNN (SDCG) to more effectively cluster nodes in a graph by iteratively training the embedding model to produce embedding vectors with improved clustering results. Through extensive experiments, we demonstrate that SDCG outperforms the conventional approach of performing embedding and clustering independently.