计算大数据相似系数的实验

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications Pub Date : 2014-07-07 DOI:10.1109/IISA.2014.6878734

M. Cosulschi, M. Gabroveanu, Florin Slabu, Adriana Sbircea

{"title":"计算大数据相似系数的实验","authors":"M. Cosulschi, M. Gabroveanu, Florin Slabu, Adriana Sbircea","doi":"10.1109/IISA.2014.6878734","DOIUrl":null,"url":null,"abstract":"Big data is a hot topic nowadays due to the huge amount of data resulted from various commercial processes and also due to every day data handled by social networks. The MapReduce programming model focuses on processing and generating large data sets. Using the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed the connections and influences that some nodes have over the other nodes. Furthermore, we have shown how Apache Hadoop framework and MapReduce programming model can be used for high volume computations. All tests were performed on a distributed cluster in order to obtain the results described in the paper.","PeriodicalId":298835,"journal":{"name":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Experiments with computing similarity coefficient over big data\",\"authors\":\"M. Cosulschi, M. Gabroveanu, Florin Slabu, Adriana Sbircea\",\"doi\":\"10.1109/IISA.2014.6878734\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data is a hot topic nowadays due to the huge amount of data resulted from various commercial processes and also due to every day data handled by social networks. The MapReduce programming model focuses on processing and generating large data sets. Using the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed the connections and influences that some nodes have over the other nodes. Furthermore, we have shown how Apache Hadoop framework and MapReduce programming model can be used for high volume computations. All tests were performed on a distributed cluster in order to obtain the results described in the paper.\",\"PeriodicalId\":298835,\"journal\":{\"name\":\"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISA.2014.6878734\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2014.6878734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

由于各种商业流程产生的大量数据以及社交网络每天处理的数据，大数据成为当今的热门话题。MapReduce编程模型侧重于处理和生成大型数据集。利用计算两个非常大的图的Jaccard相似系数得到的值，我们分析了一些节点对其他节点的连接和影响。此外，我们还展示了如何使用Apache Hadoop框架和MapReduce编程模型进行大容量计算。所有的测试都是在一个分布式集群上进行的，以获得本文中描述的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Experiments with computing similarity coefficient over big data

Big data is a hot topic nowadays due to the huge amount of data resulted from various commercial processes and also due to every day data handled by social networks. The MapReduce programming model focuses on processing and generating large data sets. Using the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed the connections and influences that some nodes have over the other nodes. Furthermore, we have shown how Apache Hadoop framework and MapReduce programming model can be used for high volume computations. All tests were performed on a distributed cluster in order to obtain the results described in the paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications

自引率

0.00%

发文量