{"title":"使用Hadoop和大数据分析对社交网站中的关键人物进行识别和排名","authors":"Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad","doi":"10.1145/2979779.2979844","DOIUrl":null,"url":null,"abstract":"Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.","PeriodicalId":298730,"journal":{"name":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Identification and ranking of key persons in a Social Networking Website using Hadoop & Big Data Analytics\",\"authors\":\"Prerna Agarwal, Rafeeq Ahmed, Tanvir Ahmad\",\"doi\":\"10.1145/2979779.2979844\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.\",\"PeriodicalId\":298730,\"journal\":{\"name\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on Advances in Information Communication Technology & Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2979779.2979844\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Advances in Information Communication Technology & Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2979779.2979844","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Identification and ranking of key persons in a Social Networking Website using Hadoop & Big Data Analytics
Big Data is a term which defines a vast amount of structured and unstructured data which is challenging to process because of its large size, using traditional algorithms and lack of high speed processing techniques. Now a days, vast amount of digital data is being gathered from many important areas, including social networking websites like Facebook and Twitter. It is important for us to mine this big data for analysis purpose. One important analysis in this domain is to find key nodes in a social graph which can be the major information spreader. Node centrality measures can be used in many graph applications such as searching and ranking of nodes. Traditional centrality algorithms fail on such huge graphs therefore it is difficult to use these algorithms on big graphs. Traditional centrality algorithms such as degree centrality, betweenness centrality and closeness centrality were not designed for such large data. In this paper, we calculate centrality measures for big graphs having huge number of edges and nodes by parallelizing traditional centrality algorithms so that they can be used in an efficient way when the size of graph grows. We use MapReduce and Hadoop to implement these algorithms for parallel and distributed data processing. We present results and anomalies of these algorithms and also show the comparative processing time taken on normal systems and on Hadoop systems.