A Comparative Analysis of Community Detection Agglomerative Technique Algorithms and Metrics on Citation Network

Q2 Computer Science
Sandeep Kumar Rachamadugu, Pushphavathi Thotadara Parameshwarappa
{"title":"A Comparative Analysis of Community Detection Agglomerative Technique Algorithms and Metrics on Citation Network","authors":"Sandeep Kumar Rachamadugu, Pushphavathi Thotadara Parameshwarappa","doi":"10.33166/aetic.2023.04.001","DOIUrl":null,"url":null,"abstract":"Social Network Analysis is a discipline that represents social relationships as a network of nodes and edges. The construction of social network with clusters will contribute in sharing the common characteristics or behaviour of a group. Partitioning the graph into modules is said to be a community. Communities are meant to symbolize actual social groups that share common characteristics. Citation network is one of the social networks with directed graphs where one paper will cite another paper and so on. Citation networks will assist the researcher in choosing research directions and evaluating research impacts. By constructing the citation networks with communities will direct the user to identify the similarity of documents which are interrelated to one or more domains. This paper introduces the agglomerative technique algorithms and metrics to a directed graph which determines the most influential nodes and group of similar nodes. The two stages required to construct the communities are how to generate network with communities and how to quantify the network performance. The strength and a quality of a network is quantified in terms of metrics like modularity, normalized mutual information (NMI), betweenness centrality, and F-Measure. The suitable community detection techniques and metrics for a citation graph were introduced in this paper. In the field of community detection, it is common practice to categorize algorithms according to the mathematical techniques they employ, and then compare them on benchmark graphs featuring a particular type of assortative community structure. The algorithms are applied for a sample citation sub data is extracted from DBLP, ACM, MAG and some additional sources which is taken from and consists of 101 nodes (nc) with 621 edges € and formed 64 communities. The key attributes in dataset are id, title, abstract, references SLM uses local optimisation and scalability to improve community detection in complicated networks. Unlike traditional methods, the proposed LS-SLM algorithm is identified that the modularity is increased by 12.65%, NMI increased by 2.31%, betweenness centrality by 3.18% and F-Score by 4.05%. The SLM algorithm outperforms existing methods in finding significant and well-defined communities, making it a promising community detection breakthrough.","PeriodicalId":36440,"journal":{"name":"Annals of Emerging Technologies in Computing","volume":"149 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Emerging Technologies in Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33166/aetic.2023.04.001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Social Network Analysis is a discipline that represents social relationships as a network of nodes and edges. The construction of social network with clusters will contribute in sharing the common characteristics or behaviour of a group. Partitioning the graph into modules is said to be a community. Communities are meant to symbolize actual social groups that share common characteristics. Citation network is one of the social networks with directed graphs where one paper will cite another paper and so on. Citation networks will assist the researcher in choosing research directions and evaluating research impacts. By constructing the citation networks with communities will direct the user to identify the similarity of documents which are interrelated to one or more domains. This paper introduces the agglomerative technique algorithms and metrics to a directed graph which determines the most influential nodes and group of similar nodes. The two stages required to construct the communities are how to generate network with communities and how to quantify the network performance. The strength and a quality of a network is quantified in terms of metrics like modularity, normalized mutual information (NMI), betweenness centrality, and F-Measure. The suitable community detection techniques and metrics for a citation graph were introduced in this paper. In the field of community detection, it is common practice to categorize algorithms according to the mathematical techniques they employ, and then compare them on benchmark graphs featuring a particular type of assortative community structure. The algorithms are applied for a sample citation sub data is extracted from DBLP, ACM, MAG and some additional sources which is taken from and consists of 101 nodes (nc) with 621 edges € and formed 64 communities. The key attributes in dataset are id, title, abstract, references SLM uses local optimisation and scalability to improve community detection in complicated networks. Unlike traditional methods, the proposed LS-SLM algorithm is identified that the modularity is increased by 12.65%, NMI increased by 2.31%, betweenness centrality by 3.18% and F-Score by 4.05%. The SLM algorithm outperforms existing methods in finding significant and well-defined communities, making it a promising community detection breakthrough.
引文网络上社区检测聚合技术、算法和度量的比较分析
社会网络分析是一门将社会关系表示为节点和边缘网络的学科。具有集群的社会网络的构建有助于共享群体的共同特征或行为。将图划分为模块称为社区。社区象征着具有共同特征的实际社会群体。引文网络是一篇论文引用另一篇论文等具有有向图的社交网络。引文网络将有助于研究者选择研究方向和评估研究影响。通过构建带有社区的引文网络,可以指导用户识别与一个或多个领域相关的文献的相似度。本文介绍了有向图的聚类技术、算法和度量,以确定最具影响力的节点和相似节点组。构建社区所需要的两个阶段是如何产生有社区的网络和如何量化网络绩效。网络的强度和质量可以用模块化、标准化互信息(NMI)、中间性中心性和F-Measure等指标来量化。本文介绍了适用于引文图的社区检测技术和指标。在社区检测领域,通常的做法是根据它们使用的数学技术对算法进行分类,然后在具有特定类型的分类社区结构的基准图上对它们进行比较。该算法应用于一个样本引用子数据,该子数据从DBLP、ACM、MAG和一些其他来源中提取,由101个节点(nc)组成,有621条边,形成64个社区。数据集的关键属性是id、title、abstract、references。SLM利用局部优化和可扩展性来提高复杂网络中的社区检测。与传统方法相比,本文提出的LS-SLM算法模块性提高了12.65%,NMI提高了2.31%,中间中心性提高了3.18%,F-Score提高了4.05%。SLM算法在寻找重要且定义良好的社区方面优于现有方法,使其成为有希望的社区检测突破。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Annals of Emerging Technologies in Computing
Annals of Emerging Technologies in Computing Computer Science-Computer Science (all)
CiteScore
3.50
自引率
0.00%
发文量
26
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信