基于graphframe的并行标签传播算法在集群中的实现

Jianxia Wang, Yu Shi, Yunfeng Xu
{"title":"基于graphframe的并行标签传播算法在集群中的实现","authors":"Jianxia Wang, Yu Shi, Yunfeng Xu","doi":"10.1109/ICCEAI55464.2022.00053","DOIUrl":null,"url":null,"abstract":"In the era of big data, the number of network users has exploded, the number of network nodes has increased, and the association relationships between nodes have become more intricate. Ordinary university students who lack a big data experimental environment have been unable to use the traditional label propagation algorithm to deal with large-scale complex network data efficiently. To solve these problems, this paper proposes a parallelized label propagation algorithm based on GraphFrames. Firstly, a multi-node big data cluster environment is built by using the existing computer room resources of universities, and GraphFrames is used to parallelize the label propagation algorithm in the cluster environment. Experiments show that the parallelized label propagation algorithm based on GraphFrames can easily cope with large-scale complex networks with millions of data nodes. The relationship between the running time of the algorithm and the number of nodes in the cluster is explored by varying the number of nodes in the cluster; In terms of the community division effect of the algorithm, the F _Measure value of the large-scale complex network with one million levels can be stably maintained at about 60%, and the F _Measure value of the small-scale real social network is improved by 20% compared with other traditional community discovery algorithm.","PeriodicalId":414181,"journal":{"name":"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of GraphFrames-Based Parallelized Label Propagation Algorithm in Clusters\",\"authors\":\"Jianxia Wang, Yu Shi, Yunfeng Xu\",\"doi\":\"10.1109/ICCEAI55464.2022.00053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of big data, the number of network users has exploded, the number of network nodes has increased, and the association relationships between nodes have become more intricate. Ordinary university students who lack a big data experimental environment have been unable to use the traditional label propagation algorithm to deal with large-scale complex network data efficiently. To solve these problems, this paper proposes a parallelized label propagation algorithm based on GraphFrames. Firstly, a multi-node big data cluster environment is built by using the existing computer room resources of universities, and GraphFrames is used to parallelize the label propagation algorithm in the cluster environment. Experiments show that the parallelized label propagation algorithm based on GraphFrames can easily cope with large-scale complex networks with millions of data nodes. The relationship between the running time of the algorithm and the number of nodes in the cluster is explored by varying the number of nodes in the cluster; In terms of the community division effect of the algorithm, the F _Measure value of the large-scale complex network with one million levels can be stably maintained at about 60%, and the F _Measure value of the small-scale real social network is improved by 20% compared with other traditional community discovery algorithm.\",\"PeriodicalId\":414181,\"journal\":{\"name\":\"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI55464.2022.00053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI55464.2022.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在大数据时代,网络用户数量爆炸式增长,网络节点数量不断增加,节点之间的关联关系更加复杂。普通大学生缺乏大数据实验环境,无法使用传统的标签传播算法高效处理大规模复杂网络数据。为了解决这些问题,本文提出了一种基于GraphFrames的并行标签传播算法。首先,利用高校现有机房资源构建多节点大数据集群环境,利用GraphFrames在集群环境中并行化标签传播算法;实验表明,基于GraphFrames的并行标签传播算法可以很容易地处理具有数百万数据节点的大规模复杂网络。通过改变集群中的节点数,探索算法运行时间与集群中节点数的关系;在算法的社区划分效果方面,百万级的大型复杂网络的F _Measure值可以稳定地维持在60%左右,小规模真实社会网络的F _Measure值相比其他传统社区发现算法提高了20%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Implementation of GraphFrames-Based Parallelized Label Propagation Algorithm in Clusters
In the era of big data, the number of network users has exploded, the number of network nodes has increased, and the association relationships between nodes have become more intricate. Ordinary university students who lack a big data experimental environment have been unable to use the traditional label propagation algorithm to deal with large-scale complex network data efficiently. To solve these problems, this paper proposes a parallelized label propagation algorithm based on GraphFrames. Firstly, a multi-node big data cluster environment is built by using the existing computer room resources of universities, and GraphFrames is used to parallelize the label propagation algorithm in the cluster environment. Experiments show that the parallelized label propagation algorithm based on GraphFrames can easily cope with large-scale complex networks with millions of data nodes. The relationship between the running time of the algorithm and the number of nodes in the cluster is explored by varying the number of nodes in the cluster; In terms of the community division effect of the algorithm, the F _Measure value of the large-scale complex network with one million levels can be stably maintained at about 60%, and the F _Measure value of the small-scale real social network is improved by 20% compared with other traditional community discovery algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信