{"title":"GridGAS:用于大规模图形分析的I/ o高效异构FPGA+CPU计算平台","authors":"Yu Zou, Mingjie Lin","doi":"10.1109/FPT.2018.00045","DOIUrl":null,"url":null,"abstract":"In this paper, we develop a highly scalable approach to constructing an efficient heterogeneous graph processing engine in order to handle extremely large graph size beyond its on-board memory capacity. Our FPGA-based computing engine not only surpasses cutting-edge GPU-based engines in terms of computing performance and energy efficiency, but also proves to be highly versatile and thus can be applied to many types of low-latency and high-throughput graph analytic tasks central to the next-generation graph-based machine learning. We analyze in detail the difference between GPU's and FPGA's architectures and provide several fundamental reasons why, for irregular computations, FPGA may surpass GPU in computing latency and energy efficiency, and discuss some \"golden rules\" for designing an efficient FPGA+CPU heterogeneous platform and GPU's inefficiency when handling extremely large-scale graph datasets. To validate our approach, we implement our FPGA-based GridGAS computing engine with a KC705 Xilinx FPGA board and a baseline implementation using a Quadro K420 GPU following the same approach and test with large-scale graph datasets. Using PCIe 2.0 x8 only, our architecture achieves up to 170.4 MTEPS and 14.8 times speedup over the GPU baseline for datasets exceeding 1.4 GB in size.","PeriodicalId":434541,"journal":{"name":"2018 International Conference on Field-Programmable Technology (FPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"GridGAS: An I/O-Efficient Heterogeneous FPGA+CPU Computing Platform for Very Large-Scale Graph Analytics\",\"authors\":\"Yu Zou, Mingjie Lin\",\"doi\":\"10.1109/FPT.2018.00045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we develop a highly scalable approach to constructing an efficient heterogeneous graph processing engine in order to handle extremely large graph size beyond its on-board memory capacity. Our FPGA-based computing engine not only surpasses cutting-edge GPU-based engines in terms of computing performance and energy efficiency, but also proves to be highly versatile and thus can be applied to many types of low-latency and high-throughput graph analytic tasks central to the next-generation graph-based machine learning. We analyze in detail the difference between GPU's and FPGA's architectures and provide several fundamental reasons why, for irregular computations, FPGA may surpass GPU in computing latency and energy efficiency, and discuss some \\\"golden rules\\\" for designing an efficient FPGA+CPU heterogeneous platform and GPU's inefficiency when handling extremely large-scale graph datasets. To validate our approach, we implement our FPGA-based GridGAS computing engine with a KC705 Xilinx FPGA board and a baseline implementation using a Quadro K420 GPU following the same approach and test with large-scale graph datasets. Using PCIe 2.0 x8 only, our architecture achieves up to 170.4 MTEPS and 14.8 times speedup over the GPU baseline for datasets exceeding 1.4 GB in size.\",\"PeriodicalId\":434541,\"journal\":{\"name\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 International Conference on Field-Programmable Technology (FPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FPT.2018.00045\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2018.00045","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GridGAS: An I/O-Efficient Heterogeneous FPGA+CPU Computing Platform for Very Large-Scale Graph Analytics
In this paper, we develop a highly scalable approach to constructing an efficient heterogeneous graph processing engine in order to handle extremely large graph size beyond its on-board memory capacity. Our FPGA-based computing engine not only surpasses cutting-edge GPU-based engines in terms of computing performance and energy efficiency, but also proves to be highly versatile and thus can be applied to many types of low-latency and high-throughput graph analytic tasks central to the next-generation graph-based machine learning. We analyze in detail the difference between GPU's and FPGA's architectures and provide several fundamental reasons why, for irregular computations, FPGA may surpass GPU in computing latency and energy efficiency, and discuss some "golden rules" for designing an efficient FPGA+CPU heterogeneous platform and GPU's inefficiency when handling extremely large-scale graph datasets. To validate our approach, we implement our FPGA-based GridGAS computing engine with a KC705 Xilinx FPGA board and a baseline implementation using a Quadro K420 GPU following the same approach and test with large-scale graph datasets. Using PCIe 2.0 x8 only, our architecture achieves up to 170.4 MTEPS and 14.8 times speedup over the GPU baseline for datasets exceeding 1.4 GB in size.