gSkeletonClu: Density-Based Network Clustering via Structure-Connected Tree Division or Agglomeration

Heli Sun, Jianbin Huang, Jiawei Han, Hongbo Deng, Peixiang Zhao, B. Feng
{"title":"gSkeletonClu: Density-Based Network Clustering via Structure-Connected Tree Division or Agglomeration","authors":"Heli Sun, Jianbin Huang, Jiawei Han, Hongbo Deng, Peixiang Zhao, B. Feng","doi":"10.1109/ICDM.2010.69","DOIUrl":null,"url":null,"abstract":"Community detection is an important task for mining the structure and function of complex networks. Many pervious approaches are difficult to detect communities with arbitrary size and shape, and are unable to identify hubs and outliers. A recently proposed network clustering algorithm, SCAN, is effective and can overcome this difficulty. However, it depends on a sensitive parameter: minimum similarity threshold $\\varepsilon$, but provides no automated way to find it. In this paper, we propose a novel density-based network clustering algorithm, called gSkeletonClu (graph-skeleton based clustering). By projecting a network to its Core-Connected Maximal Spanning Tree (CCMST), the network clustering problem is converted to finding core-connected components in the CCMST. We discover that all possible values of the parameter $\\varepsilon$ lie in the edge weights of the corresponding CCMST. By means of tree divisive or agglomerative clustering, our algorithm can find the optimal parameter $\\varepsilon$ and detect communities, hubs and outliers in large-scale undirected networks automatically without any user interaction. Extensive experiments on both real-world and synthetic networks demonstrate the superior performance of gSkeletonClu over the baseline methods.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"75","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.69","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 75

Abstract

Community detection is an important task for mining the structure and function of complex networks. Many pervious approaches are difficult to detect communities with arbitrary size and shape, and are unable to identify hubs and outliers. A recently proposed network clustering algorithm, SCAN, is effective and can overcome this difficulty. However, it depends on a sensitive parameter: minimum similarity threshold $\varepsilon$, but provides no automated way to find it. In this paper, we propose a novel density-based network clustering algorithm, called gSkeletonClu (graph-skeleton based clustering). By projecting a network to its Core-Connected Maximal Spanning Tree (CCMST), the network clustering problem is converted to finding core-connected components in the CCMST. We discover that all possible values of the parameter $\varepsilon$ lie in the edge weights of the corresponding CCMST. By means of tree divisive or agglomerative clustering, our algorithm can find the optimal parameter $\varepsilon$ and detect communities, hubs and outliers in large-scale undirected networks automatically without any user interaction. Extensive experiments on both real-world and synthetic networks demonstrate the superior performance of gSkeletonClu over the baseline methods.
gSkeletonClu:基于密度的网络聚类,通过结构连接树划分或聚类
社区检测是挖掘复杂网络结构和功能的一项重要任务。许多先前的方法难以检测任意大小和形状的社区,并且无法识别中心和异常值。最近提出的一种网络聚类算法SCAN有效地克服了这一困难。然而,它依赖于一个敏感参数:最小相似性阈值$\varepsilon$,但没有提供自动查找它的方法。在本文中,我们提出了一种新的基于密度的网络聚类算法,称为gskeleton(基于图骨架的聚类)。通过将网络映射到其核心连接的最大生成树(CCMST)上,将网络聚类问题转化为在CCMST中寻找核心连接的组件。我们发现参数$\varepsilon$的所有可能值都存在于相应CCMST的边权中。该算法采用树分裂聚类或聚类聚类的方法,在不需要任何用户交互的情况下,自动找到最优参数,检测大规模无向网络中的社区、集线器和离群点。在现实世界和合成网络上进行的大量实验表明,gskeleton的性能优于基线方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信