Experiments with computing similarity coefficient over big data

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications Pub Date : 2014-07-07 DOI:10.1109/IISA.2014.6878734

M. Cosulschi, M. Gabroveanu, Florin Slabu, Adriana Sbircea

引用次数: 1

Abstract

Big data is a hot topic nowadays due to the huge amount of data resulted from various commercial processes and also due to every day data handled by social networks. The MapReduce programming model focuses on processing and generating large data sets. Using the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed the connections and influences that some nodes have over the other nodes. Furthermore, we have shown how Apache Hadoop framework and MapReduce programming model can be used for high volume computations. All tests were performed on a distributed cluster in order to obtain the results described in the paper.

查看原文本刊更多论文

计算大数据相似系数的实验

由于各种商业流程产生的大量数据以及社交网络每天处理的数据，大数据成为当今的热门话题。MapReduce编程模型侧重于处理和生成大型数据集。利用计算两个非常大的图的Jaccard相似系数得到的值，我们分析了一些节点对其他节点的连接和影响。此外，我们还展示了如何使用Apache Hadoop框架和MapReduce编程模型进行大容量计算。所有的测试都是在一个分布式集群上进行的，以获得本文中描述的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

自引率

0.00%

发文量