Massive Social Network Analysis: Mining Twitter for Social Good

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI:10.1109/ICPP.2010.66

David Ediger, Karl Jiang, E. J. Riedy, David A. Bader, Courtney Corley, R. Farber, William N. Reynolds

引用次数: 171

Abstract

Social networks produce an enormous quantity of data. Facebook consists of over 400 million active users sharing over 5 billion pieces of information each month. Analyzing this vast quantity of unstructured data presents challenges for software and hardware. We present GraphCT, a Graph Characterization Toolkit for massive graphs representing social network data. On a 128-processor Cray XMT, GraphCT estimates the betweenness centrality of an artificially generated (R-MAT) 537 million vertex, 8.6 billion edge graph in 55 minutes and a real-world graph (Kwak, et al.) with 61.6 million vertices and 1.47 billion edges in 105 minutes. We use GraphCT to analyze public data from Twitter, a microblogging network. Twitter's message connections appear primarily tree-structured as a news dissemination system. Within the public data, however, are clusters of conversations. Using GraphCT, we can rank actors within these conversations and help analysts focus attention on a much smaller data subset.

查看原文本刊更多论文

大规模社会网络分析:挖掘Twitter的社会利益

社交网络产生了大量的数据。Facebook由超过4亿活跃用户组成，每个月分享超过50亿条信息。分析如此大量的非结构化数据对软件和硬件都提出了挑战。我们提出GraphCT，一个图形表征工具包，用于表示社交网络数据的大量图形。在拥有128个处理器的Cray XMT上，GraphCT在55分钟内估计了人工生成(R-MAT) 5.37亿个顶点、86亿个边的图和具有6160万个顶点和14.7亿个边的真实图(Kwak等人)的中间性中心性。我们使用GraphCT分析来自微博网络Twitter的公开数据。作为一个新闻传播系统，Twitter的信息连接主要呈树状结构。然而，在公共数据中，是对话的集群。使用GraphCT，我们可以对这些对话中的参与者进行排序，并帮助分析人员将注意力集中在更小的数据子集上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 39th International Conference on Parallel Processing

自引率

0.00%

发文量