Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, H. Kitagawa
{"title":"Fast and Parallel Ranking-based Clustering for Heterogeneous Graphs","authors":"Kotaro Yamazaki, Tomoki Sato, Hiroaki Shiokawa, H. Kitagawa","doi":"10.26421/JDI1.2-3","DOIUrl":null,"url":null,"abstract":"The demands for graph data analysis methods are increasing. RankClus is a framework to extract clusters by integrating clustering and ranking on heterogeneous graphs; it enhances the clustering results by alternately updates the results of clustering and ranking for the better understanding of the clusters. However, RankClus is computationally expensive if a graph is large since it needs to iterate both clustering and ranking for all nodes. In this paper, to address this problem, we propose a novel fast RankClus algorithm for heterogeneous graphs. To speed up the entire procedure of RankClus, our proposed algorithm reduces the computational cost of the ranking process in each iteration. Our proposal measures how each node affects the clustering result; if it is not significant, we prune the node. Furthermore, we also present a parallel algorithm by extending our proposed algorithm by fully exploiting a modern manycore CPU. As a result, our extensive evaluations clarified that our fast and parallel algorithms drastically cut off the computation time of the original algorithm RancClus.","PeriodicalId":232625,"journal":{"name":"J. Data Intell.","volume":"73 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Data Intell.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26421/JDI1.2-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The demands for graph data analysis methods are increasing. RankClus is a framework to extract clusters by integrating clustering and ranking on heterogeneous graphs; it enhances the clustering results by alternately updates the results of clustering and ranking for the better understanding of the clusters. However, RankClus is computationally expensive if a graph is large since it needs to iterate both clustering and ranking for all nodes. In this paper, to address this problem, we propose a novel fast RankClus algorithm for heterogeneous graphs. To speed up the entire procedure of RankClus, our proposed algorithm reduces the computational cost of the ranking process in each iteration. Our proposal measures how each node affects the clustering result; if it is not significant, we prune the node. Furthermore, we also present a parallel algorithm by extending our proposed algorithm by fully exploiting a modern manycore CPU. As a result, our extensive evaluations clarified that our fast and parallel algorithms drastically cut off the computation time of the original algorithm RancClus.