Definition and Goal of Graph Clustering -Motivation to Explore a New Algorithm

Yuching Lu, G. Chakraborty
{"title":"Definition and Goal of Graph Clustering -Motivation to Explore a New Algorithm","authors":"Yuching Lu, G. Chakraborty","doi":"10.1109/ICAwST.2019.8923556","DOIUrl":null,"url":null,"abstract":"In recent years, one of the popular method for mining evolving data, is to convert it to a network, in fact an adjacency matrix, where data units are nodes and their relations/similarity are the link weights. The network is then partitioned into communities to explore information hidden in the data. For last 50 years, various graph partitioning algorithms are proposed. Depending on the application, the optimization objective of the partitioning and the resulting clusters are different. Different algorithms, based on Linear Algebra, heuristics with different greedy optimization criterion, agglomerative algorithms, are proposed to meet different optimization criterion suitable for target network and application.In this work we proposed a genetic algorithm (GA) based dynamic clustering algorithm. Genetic algorithm (GA) for graph clustering, with fitness function as the modularity index, is already proposed, in our previous works. In this work, we propose a multimodal GA. The algorithm starts with modularity index (Q) as the optimization criterion. Once converged, we add another term in the fitness function which will balance the cardinalities of the partitions. The parameters are computed based on the partitions after the first stage of convergence. We continue to run the Genetic search with modified fitness function until a second convergence is achieved. In all our experiments, not only could we achieve better balance of the size of different clusters, in many experiments it actually improved the modularity index. For a highly modular graph, with Q ≥ 0.7, most of the algorithms produce the same result. When the optimum modularity index of the graph is low, GA with only modularity index as optimization criterion usually converges in local minimum. With the proposed modification, we could always find clustering with an improved Q value. We run popular partitioning algorithms on known real-world networks and found that the proposed algorithm could find better partitioning, closest to reality.","PeriodicalId":156538,"journal":{"name":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAwST.2019.8923556","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years, one of the popular method for mining evolving data, is to convert it to a network, in fact an adjacency matrix, where data units are nodes and their relations/similarity are the link weights. The network is then partitioned into communities to explore information hidden in the data. For last 50 years, various graph partitioning algorithms are proposed. Depending on the application, the optimization objective of the partitioning and the resulting clusters are different. Different algorithms, based on Linear Algebra, heuristics with different greedy optimization criterion, agglomerative algorithms, are proposed to meet different optimization criterion suitable for target network and application.In this work we proposed a genetic algorithm (GA) based dynamic clustering algorithm. Genetic algorithm (GA) for graph clustering, with fitness function as the modularity index, is already proposed, in our previous works. In this work, we propose a multimodal GA. The algorithm starts with modularity index (Q) as the optimization criterion. Once converged, we add another term in the fitness function which will balance the cardinalities of the partitions. The parameters are computed based on the partitions after the first stage of convergence. We continue to run the Genetic search with modified fitness function until a second convergence is achieved. In all our experiments, not only could we achieve better balance of the size of different clusters, in many experiments it actually improved the modularity index. For a highly modular graph, with Q ≥ 0.7, most of the algorithms produce the same result. When the optimum modularity index of the graph is low, GA with only modularity index as optimization criterion usually converges in local minimum. With the proposed modification, we could always find clustering with an improved Q value. We run popular partitioning algorithms on known real-world networks and found that the proposed algorithm could find better partitioning, closest to reality.
图聚类的定义与目标——探索新算法的动机
近年来,挖掘演化数据的一种流行方法是将其转化为一个网络,实际上是一个邻接矩阵,其中数据单位是节点,它们的关系/相似度是链路权重。然后将网络划分为社区,以探索隐藏在数据中的信息。近50年来,人们提出了各种各样的图划分算法。根据应用程序的不同,分区的优化目标和生成的集群是不同的。针对不同的目标网络和应用,提出了基于线性代数的不同优化算法、不同贪心优化准则的启发式算法、聚类算法等。本文提出了一种基于遗传算法的动态聚类算法。以适应度函数作为模块化指标的遗传算法(GA)已经在我们之前的工作中被提出。在这项工作中,我们提出了一个多模态遗传算法。该算法以模块化指数(Q)作为优化准则。一旦收敛,我们在适应度函数中添加另一项来平衡分区的基数。在第一阶段收敛后,根据分区计算参数。我们继续用修正适应度函数进行遗传搜索,直到实现第二次收敛。在我们所有的实验中,我们不仅可以更好地平衡不同簇的大小,而且在许多实验中,它实际上提高了模块化指数。对于Q≥0.7的高度模图,大多数算法产生相同的结果。当图的最优模块化指数较低时,仅以模块化指数为优化准则的遗传算法通常收敛于局部极小值。通过提出的改进,我们总能找到具有改进Q值的聚类。我们在已知的现实世界网络上运行流行的分区算法,发现所提出的算法可以找到更好的分区,最接近现实。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信