M. Cosulschi, M. Gabroveanu, Florin Slabu, Adriana Sbircea
{"title":"Experiments with computing similarity coefficient over big data","authors":"M. Cosulschi, M. Gabroveanu, Florin Slabu, Adriana Sbircea","doi":"10.1109/IISA.2014.6878734","DOIUrl":null,"url":null,"abstract":"Big data is a hot topic nowadays due to the huge amount of data resulted from various commercial processes and also due to every day data handled by social networks. The MapReduce programming model focuses on processing and generating large data sets. Using the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed the connections and influences that some nodes have over the other nodes. Furthermore, we have shown how Apache Hadoop framework and MapReduce programming model can be used for high volume computations. All tests were performed on a distributed cluster in order to obtain the results described in the paper.","PeriodicalId":298835,"journal":{"name":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISA.2014.6878734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Big data is a hot topic nowadays due to the huge amount of data resulted from various commercial processes and also due to every day data handled by social networks. The MapReduce programming model focuses on processing and generating large data sets. Using the values obtained by computing the Jaccard similarity coefficients for two very large graphs, we have analysed the connections and influences that some nodes have over the other nodes. Furthermore, we have shown how Apache Hadoop framework and MapReduce programming model can be used for high volume computations. All tests were performed on a distributed cluster in order to obtain the results described in the paper.