A Comparative Analysis of Community Detection Agglomerative Technique Algorithms and Metrics on Citation Network

Q2 Computer Science

Annals of Emerging Technologies in Computing Pub Date : 2023-10-01 DOI:10.33166/aetic.2023.04.001

Sandeep Kumar Rachamadugu, Pushphavathi Thotadara Parameshwarappa

{"title":"A Comparative Analysis of Community Detection Agglomerative Technique Algorithms and Metrics on Citation Network","authors":"Sandeep Kumar Rachamadugu, Pushphavathi Thotadara Parameshwarappa","doi":"10.33166/aetic.2023.04.001","DOIUrl":null,"url":null,"abstract":"Social Network Analysis is a discipline that represents social relationships as a network of nodes and edges. The construction of social network with clusters will contribute in sharing the common characteristics or behaviour of a group. Partitioning the graph into modules is said to be a community. Communities are meant to symbolize actual social groups that share common characteristics. Citation network is one of the social networks with directed graphs where one paper will cite another paper and so on. Citation networks will assist the researcher in choosing research directions and evaluating research impacts. By constructing the citation networks with communities will direct the user to identify the similarity of documents which are interrelated to one or more domains. This paper introduces the agglomerative technique algorithms and metrics to a directed graph which determines the most influential nodes and group of similar nodes. The two stages required to construct the communities are how to generate network with communities and how to quantify the network performance. The strength and a quality of a network is quantified in terms of metrics like modularity, normalized mutual information (NMI), betweenness centrality, and F-Measure. The suitable community detection techniques and metrics for a citation graph were introduced in this paper. In the field of community detection, it is common practice to categorize algorithms according to the mathematical techniques they employ, and then compare them on benchmark graphs featuring a particular type of assortative community structure. The algorithms are applied for a sample citation sub data is extracted from DBLP, ACM, MAG and some additional sources which is taken from and consists of 101 nodes (nc) with 621 edges € and formed 64 communities. The key attributes in dataset are id, title, abstract, references SLM uses local optimisation and scalability to improve community detection in complicated networks. Unlike traditional methods, the proposed LS-SLM algorithm is identified that the modularity is increased by 12.65%, NMI increased by 2.31%, betweenness centrality by 3.18% and F-Score by 4.05%. The SLM algorithm outperforms existing methods in finding significant and well-defined communities, making it a promising community detection breakthrough.","PeriodicalId":36440,"journal":{"name":"Annals of Emerging Technologies in Computing","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Emerging Technologies in Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33166/aetic.2023.04.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

Abstract

Social Network Analysis is a discipline that represents social relationships as a network of nodes and edges. The construction of social network with clusters will contribute in sharing the common characteristics or behaviour of a group. Partitioning the graph into modules is said to be a community. Communities are meant to symbolize actual social groups that share common characteristics. Citation network is one of the social networks with directed graphs where one paper will cite another paper and so on. Citation networks will assist the researcher in choosing research directions and evaluating research impacts. By constructing the citation networks with communities will direct the user to identify the similarity of documents which are interrelated to one or more domains. This paper introduces the agglomerative technique algorithms and metrics to a directed graph which determines the most influential nodes and group of similar nodes. The two stages required to construct the communities are how to generate network with communities and how to quantify the network performance. The strength and a quality of a network is quantified in terms of metrics like modularity, normalized mutual information (NMI), betweenness centrality, and F-Measure. The suitable community detection techniques and metrics for a citation graph were introduced in this paper. In the field of community detection, it is common practice to categorize algorithms according to the mathematical techniques they employ, and then compare them on benchmark graphs featuring a particular type of assortative community structure. The algorithms are applied for a sample citation sub data is extracted from DBLP, ACM, MAG and some additional sources which is taken from and consists of 101 nodes (nc) with 621 edges € and formed 64 communities. The key attributes in dataset are id, title, abstract, references SLM uses local optimisation and scalability to improve community detection in complicated networks. Unlike traditional methods, the proposed LS-SLM algorithm is identified that the modularity is increased by 12.65%, NMI increased by 2.31%, betweenness centrality by 3.18% and F-Score by 4.05%. The SLM algorithm outperforms existing methods in finding significant and well-defined communities, making it a promising community detection breakthrough.

查看原文本刊更多论文

引文网络上社区检测聚合技术、算法和度量的比较分析

社会网络分析是一门将社会关系表示为节点和边缘网络的学科。具有集群的社会网络的构建有助于共享群体的共同特征或行为。将图划分为模块称为社区。社区象征着具有共同特征的实际社会群体。引文网络是一篇论文引用另一篇论文等具有有向图的社交网络。引文网络将有助于研究者选择研究方向和评估研究影响。通过构建带有社区的引文网络，可以指导用户识别与一个或多个领域相关的文献的相似度。本文介绍了有向图的聚类技术、算法和度量，以确定最具影响力的节点和相似节点组。构建社区所需要的两个阶段是如何产生有社区的网络和如何量化网络绩效。网络的强度和质量可以用模块化、标准化互信息(NMI)、中间性中心性和F-Measure等指标来量化。本文介绍了适用于引文图的社区检测技术和指标。在社区检测领域，通常的做法是根据它们使用的数学技术对算法进行分类，然后在具有特定类型的分类社区结构的基准图上对它们进行比较。该算法应用于一个样本引用子数据，该子数据从DBLP、ACM、MAG和一些其他来源中提取，由101个节点(nc)组成，有621条边，形成64个社区。数据集的关键属性是id、title、abstract、references。SLM利用局部优化和可扩展性来提高复杂网络中的社区检测。与传统方法相比，本文提出的LS-SLM算法模块性提高了12.65%，NMI提高了2.31%，中间中心性提高了3.18%，F-Score提高了4.05%。SLM算法在寻找重要且定义良好的社区方面优于现有方法，使其成为有希望的社区检测突破。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Emerging Technologies in Computing Computer Science-Computer Science (all)

CiteScore

3.50

自引率

0.00%

发文量