{"title":"Large-Scale BSP Graph Processing in Distributed Non-Volatile Memory","authors":"T. Nito, Yoshiko Nagasaka, H. Uchigaito","doi":"10.1145/2764947.2764949","DOIUrl":null,"url":null,"abstract":"Processing large graphs is becoming increasingly important for many domains. Large-scale graph processing requires a large-scale cluster system, which is very expensive. Thus, for high-performance large-scale graph processing in small clusters, we have developed bulk synchronous parallel graph processing in distributed non-volatile memory that has lower bit cost, lower power consumption, and larger capacity than DRAM. When non-volatile memory is used, accessing non-volatile memory is a performance bottleneck because accesses to non-volatile memory are fine-grained random accesses and non-volatile memory has much larger latency than DRAM. Thus, we propose non-volatile memory group access method and the implementation for using non-volatile memory efficiently. Proposed method and implementation improve the access performance to non-volatile memory by changing fine-grained random accesses to random accesses the same size as a non-volatile memory page and hiding non-volatile memory latency with pipelining. An evaluation indicated that the proposed graph processing can hide the latency of non-volatile memory and has the proportional performance to non-volatile memory bandwidth. When non-volatile memory read/write mixture bandwidth is 4.2 GB/sec, the performance of proposed graph processing and the performance storing all data in main memory have the same order of magnitude (46%). In addition, the proposed graph processing had scalable performance for any number of nodes. The proposed method and implementation can process 125 times bigger graph than a DRAM-only system.","PeriodicalId":144860,"journal":{"name":"Proceedings of the GRADES'15","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the GRADES'15","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2764947.2764949","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Processing large graphs is becoming increasingly important for many domains. Large-scale graph processing requires a large-scale cluster system, which is very expensive. Thus, for high-performance large-scale graph processing in small clusters, we have developed bulk synchronous parallel graph processing in distributed non-volatile memory that has lower bit cost, lower power consumption, and larger capacity than DRAM. When non-volatile memory is used, accessing non-volatile memory is a performance bottleneck because accesses to non-volatile memory are fine-grained random accesses and non-volatile memory has much larger latency than DRAM. Thus, we propose non-volatile memory group access method and the implementation for using non-volatile memory efficiently. Proposed method and implementation improve the access performance to non-volatile memory by changing fine-grained random accesses to random accesses the same size as a non-volatile memory page and hiding non-volatile memory latency with pipelining. An evaluation indicated that the proposed graph processing can hide the latency of non-volatile memory and has the proportional performance to non-volatile memory bandwidth. When non-volatile memory read/write mixture bandwidth is 4.2 GB/sec, the performance of proposed graph processing and the performance storing all data in main memory have the same order of magnitude (46%). In addition, the proposed graph processing had scalable performance for any number of nodes. The proposed method and implementation can process 125 times bigger graph than a DRAM-only system.