Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li
{"title":"PSMC:基于动机传导的图形聚类的可证明和可扩展算法","authors":"Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li","doi":"arxiv-2406.07357","DOIUrl":null,"url":null,"abstract":"Higher-order graph clustering aims to partition the graph using frequently\noccurring subgraphs. Motif conductance is one of the most promising\nhigher-order graph clustering models due to its strong interpretability.\nHowever, existing motif conductance based graph clustering algorithms are\nmainly limited by a seminal two-stage reweighting computing framework, needing\nto enumerate all motif instances to obtain an edge-weighted graph for\npartitioning. However, such a framework has two-fold vital defects: (1) It can\nonly provide a quadratic bound for the motif with three vertices, and whether\nthere is provable clustering quality for other motifs is still an open\nquestion. (2) The enumeration procedure of motif instances incurs prohibitively\nhigh costs against large motifs or large dense graphs due to combinatorial\nexplosions. Besides, expensive spectral clustering or local graph diffusion on\nthe edge-weighted graph also makes existing methods unable to handle massive\ngraphs with millions of nodes. To overcome these dilemmas, we propose a\nProvable and Scalable Motif Conductance algorithm PSMC, which has a fixed and\nmotif-independent approximation ratio for any motif. Specifically, PSMC first\ndefines a new vertex metric Motif Resident based on the given motif, which can\nbe computed locally. Then, it iteratively deletes the vertex with the smallest\nmotif resident value very efficiently using novel dynamic update technologies.\nFinally, it outputs the locally optimal result during the above iterative\nprocess. To further boost efficiency, we propose several effective bounds to\nestimate the motif resident value of each vertex, which can greatly reduce\ncomputational costs. Empirical results show that our proposed algorithms\nachieve 3.2-32 times speedup and improve the quality by at least 12 times than\nthe baselines.","PeriodicalId":501024,"journal":{"name":"arXiv - CS - Computational Complexity","volume":"11 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering\",\"authors\":\"Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li\",\"doi\":\"arxiv-2406.07357\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Higher-order graph clustering aims to partition the graph using frequently\\noccurring subgraphs. Motif conductance is one of the most promising\\nhigher-order graph clustering models due to its strong interpretability.\\nHowever, existing motif conductance based graph clustering algorithms are\\nmainly limited by a seminal two-stage reweighting computing framework, needing\\nto enumerate all motif instances to obtain an edge-weighted graph for\\npartitioning. However, such a framework has two-fold vital defects: (1) It can\\nonly provide a quadratic bound for the motif with three vertices, and whether\\nthere is provable clustering quality for other motifs is still an open\\nquestion. (2) The enumeration procedure of motif instances incurs prohibitively\\nhigh costs against large motifs or large dense graphs due to combinatorial\\nexplosions. Besides, expensive spectral clustering or local graph diffusion on\\nthe edge-weighted graph also makes existing methods unable to handle massive\\ngraphs with millions of nodes. To overcome these dilemmas, we propose a\\nProvable and Scalable Motif Conductance algorithm PSMC, which has a fixed and\\nmotif-independent approximation ratio for any motif. Specifically, PSMC first\\ndefines a new vertex metric Motif Resident based on the given motif, which can\\nbe computed locally. Then, it iteratively deletes the vertex with the smallest\\nmotif resident value very efficiently using novel dynamic update technologies.\\nFinally, it outputs the locally optimal result during the above iterative\\nprocess. To further boost efficiency, we propose several effective bounds to\\nestimate the motif resident value of each vertex, which can greatly reduce\\ncomputational costs. Empirical results show that our proposed algorithms\\nachieve 3.2-32 times speedup and improve the quality by at least 12 times than\\nthe baselines.\",\"PeriodicalId\":501024,\"journal\":{\"name\":\"arXiv - CS - Computational Complexity\",\"volume\":\"11 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computational Complexity\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.07357\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computational Complexity","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.07357","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering
Higher-order graph clustering aims to partition the graph using frequently
occurring subgraphs. Motif conductance is one of the most promising
higher-order graph clustering models due to its strong interpretability.
However, existing motif conductance based graph clustering algorithms are
mainly limited by a seminal two-stage reweighting computing framework, needing
to enumerate all motif instances to obtain an edge-weighted graph for
partitioning. However, such a framework has two-fold vital defects: (1) It can
only provide a quadratic bound for the motif with three vertices, and whether
there is provable clustering quality for other motifs is still an open
question. (2) The enumeration procedure of motif instances incurs prohibitively
high costs against large motifs or large dense graphs due to combinatorial
explosions. Besides, expensive spectral clustering or local graph diffusion on
the edge-weighted graph also makes existing methods unable to handle massive
graphs with millions of nodes. To overcome these dilemmas, we propose a
Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and
motif-independent approximation ratio for any motif. Specifically, PSMC first
defines a new vertex metric Motif Resident based on the given motif, which can
be computed locally. Then, it iteratively deletes the vertex with the smallest
motif resident value very efficiently using novel dynamic update technologies.
Finally, it outputs the locally optimal result during the above iterative
process. To further boost efficiency, we propose several effective bounds to
estimate the motif resident value of each vertex, which can greatly reduce
computational costs. Empirical results show that our proposed algorithms
achieve 3.2-32 times speedup and improve the quality by at least 12 times than
the baselines.