{"title":"Scalable Local Community Detection with Mapreduce for Large Networks","authors":"Ren Wang, Andong Wang, Talat Syed, Osmar R Zaiane","doi":"10.5121/IJDKP.2017.7203","DOIUrl":null,"url":null,"abstract":"Community detection from complex information networks draws much attention from both academia and industry since it has many real-world applications. However, scalability of community detection algorithms over very large networks has been a major challenge. Real-world graph structures are often complicated accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that parallelizes a local community identification method which uses the $M$ metric. Then we adopt an iterative expansion approach to find all the communities in the graph. Empirical results show that for large networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional sequential approach to detect communities using the M-measure. The result shows that for local community detection, when the data is too big for the original M metric-based sequential iterative expension approach to handle, our MapReduce version 3MA can finish in a reasonable time.","PeriodicalId":131153,"journal":{"name":"International Journal of Data Mining & Knowledge Management Process","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining & Knowledge Management Process","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/IJDKP.2017.7203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Community detection from complex information networks draws much attention from both academia and industry since it has many real-world applications. However, scalability of community detection algorithms over very large networks has been a major challenge. Real-world graph structures are often complicated accompanied with extremely large sizes. In this paper, we propose a MapReduce version called 3MA that parallelizes a local community identification method which uses the $M$ metric. Then we adopt an iterative expansion approach to find all the communities in the graph. Empirical results show that for large networks in the order of millions of nodes, the parallel version of the algorithm outperforms the traditional sequential approach to detect communities using the M-measure. The result shows that for local community detection, when the data is too big for the original M metric-based sequential iterative expension approach to handle, our MapReduce version 3MA can finish in a reasonable time.