使用Hadoop和大数据分析对社交网站中的关键人物进行识别和排名

Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad
{"title":"使用Hadoop和大数据分析对社交网站中的关键人物进行识别和排名","authors":"Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad","doi":"10.1145/2979779.2979844","DOIUrl":null,"url":null,"abstract":"Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.","PeriodicalId":298730,"journal":{"name":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Identification and ranking of key persons in a Social Networking Website using Hadoop & Big Data Analytics\",\"authors\":\"Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad\",\"doi\":\"10.1145/2979779.2979844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.\",\"PeriodicalId\":298730,\"journal\":{\"name\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2979779.2979844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2979779.2979844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

大数据是一个定义大量结构化和非结构化数据的术语,由于其规模大,使用传统算法和缺乏高速处理技术,因此具有挑战性。如今,大量的数字数据正在从许多重要领域收集,包括像Facebook和Twitter这样的社交网站。对我们来说,挖掘这些大数据用于分析是很重要的。该领域的一个重要分析是在社交图中找到关键节点,这些节点可能是主要的信息传播者。节点中心性度量可用于许多图应用程序,如节点的搜索和排序。传统的中心性算法在如此庞大的图上无法应用,因此很难在大图上应用。传统的中心性算法,如度中心性、中间中心性和接近中心性等,并没有针对如此大的数据进行设计。本文通过并行化传统的中心性算法来计算具有大量边和节点的大图的中心性度量,以便在图的规模增长时能够有效地使用它们。我们使用MapReduce和Hadoop来实现这些算法,以实现并行和分布式数据处理。我们给出了这些算法的结果和异常情况,并展示了在正常系统和Hadoop系统上的比较处理时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Identification and ranking of key persons in a Social Networking Website using Hadoop & Big Data Analytics
Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信