Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation

Proceedings of the ACM on Management of Data Pub Date : 2023-11-13 DOI:10.1145/3617335

Zijin Feng, Miao Qiao, Hong Cheng

{"title":"Modularity-based Hypergraph Clustering: Random Hypergraph Model, Hyperedge-cluster Relation, and Computation","authors":"Zijin Feng, Miao Qiao, Hong Cheng","doi":"10.1145/3617335","DOIUrl":null,"url":null,"abstract":"A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.","PeriodicalId":498157,"journal":{"name":"Proceedings of the ACM on Management of Data","volume":"34 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3617335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A graph models the connections among objects. One important graph analytical task is clustering which partitions a data graph into clusters with dense innercluster connections. A line of clustering maximizes a function called modularity. Modularity-based clustering is widely adopted on dyadic graphs due to its scalability and clustering quality which depends highly on its selection of a random graph model. The random graph model decides not only which clustering is preferred - modularity measures the quality of a clustering based on its alignment to the edges of a random graph, but also the cost of computing such an alignment. Existing random hypergraph models either measure the hyperedge-cluster alignment in an All-Or-Nothing (AON) manner, losing important group-wise information, or introduce expensive alignment computation, refraining the clustering from scaling up. This paper proposes a new random hypergraph model called Hyperedge Expansion Model (HEM), a non-AON hypergraph modularity function called Partial Innerclusteredge modularity (PI) based on HEM, a clustering algorithm called Partial Innerclusteredge Clustering (PIC) that optimizes PI, and novel computation optimizations. PIC is a scalable modularity-based hypergraph clustering that can effectively capture the non-AON hyperedge-cluster relation. Our experiments show that PIC outperforms eight state-of-the-art methods on real-world hypergraphs in terms of both clustering quality and scalability and is up to five orders of magnitude faster than the baseline methods.

查看原文本刊更多论文

基于模块化的超图聚类:随机超图模型、超边缘聚类关系和计算

图对对象之间的连接进行建模。聚类是一项重要的图分析任务，它将数据图划分为具有密集簇内连接的簇。一条聚类线最大化了一个称为模块化的功能。基于模块化的聚类由于其可扩展性和聚类质量在很大程度上取决于其随机图模型的选择而被广泛应用于二进图。随机图模型不仅决定了哪种聚类是首选的——模块化根据聚类与随机图边缘的对齐程度来衡量聚类的质量，而且还决定了计算这种对齐的成本。现有的随机超图模型要么以全有或全无(AON)的方式测量超边缘-集群对齐，从而丢失重要的组明智信息，要么引入昂贵的对齐计算，从而限制了集群的扩展。本文提出了一种新的随机超图模型Hyperedge展开模型(HEM)，一种基于HEM的非aon超图模块化函数Partial Innerclusteredge modularity (PI)，一种优化PI的聚类算法Partial Innerclusteredge clustering (PIC)，以及一些新的计算优化方法。PIC是一种可扩展的基于模块化的超图集群，可以有效地捕获非aon超边缘集群关系。我们的实验表明，在聚类质量和可扩展性方面，PIC在现实世界的超图上优于八种最先进的方法，并且比基线方法快了五个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM on Management of Data

自引率

0.00%

发文量