使用Hadoop和大数据分析对社交网站中的关键人物进行识别和排名

Proceedings of the International Conference on Advances in Information Communication Technology & Computing Pub Date : 2016-08-12 DOI:10.1145/2979779.2979844

Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad

{"title":"使用Hadoop和大数据分析对社交网站中的关键人物进行识别和排名","authors":"Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad","doi":"10.1145/2979779.2979844","DOIUrl":null,"url":null,"abstract":"Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.","PeriodicalId":298730,"journal":{"name":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Identification and ranking of key persons in a Social Networking Website using Hadoop & Big Data Analytics\",\"authors\":\"Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad\",\"doi\":\"10.1145/2979779.2979844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.\",\"PeriodicalId\":298730,\"journal\":{\"name\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2979779.2979844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2979779.2979844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

大数据是一个定义大量结构化和非结构化数据的术语，由于其规模大，使用传统算法和缺乏高速处理技术，因此具有挑战性。如今，大量的数字数据正在从许多重要领域收集，包括像Facebook和Twitter这样的社交网站。对我们来说，挖掘这些大数据用于分析是很重要的。该领域的一个重要分析是在社交图中找到关键节点，这些节点可能是主要的信息传播者。节点中心性度量可用于许多图应用程序，如节点的搜索和排序。传统的中心性算法在如此庞大的图上无法应用，因此很难在大图上应用。传统的中心性算法，如度中心性、中间中心性和接近中心性等，并没有针对如此大的数据进行设计。本文通过并行化传统的中心性算法来计算具有大量边和节点的大图的中心性度量，以便在图的规模增长时能够有效地使用它们。我们使用MapReduce和Hadoop来实现这些算法，以实现并行和分布式数据处理。我们给出了这些算法的结果和异常情况，并展示了在正常系统和Hadoop系统上的比较处理时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Identification and ranking of key persons in a Social Networking Website using Hadoop & Big Data Analytics

Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on Advances in Information Communication Technology & Computing

自引率

0.00%

发文量