Tianqi Wang, Xi Jin, Bo Peng, Chuanjun Wang, Linlin Zheng
{"title":"RP-Ring: n体模拟的异构多fpga加速解决方案","authors":"Tianqi Wang, Xi Jin, Bo Peng, Chuanjun Wang, Linlin Zheng","doi":"10.1109/FCCM.2016.20","DOIUrl":null,"url":null,"abstract":"We propose an heterogeneous multi-FPGA accelerating solution, which is called as RP-ring (Reconfigurable Processor ring), for direct-summation N-body simulation. In this solution, we try to use existing FPGA boards rather than design new specialized boards to reduce cost. It can be expanded conveniently with any available FPGA board and only requires quite low communication bandwidth between FPGA boards. The communication protocol is simple and can be implemented with limited hardware/software resource. In order to prevent the slowest board from dragging the overall performance down, we build a mathematical model to decompose workload among FPGAs. The model divide workload based on the logic resource, memory access bandwidth and communication bandwidth of each FPGA chip. We apply the solution in astrodynamics simulation and achieve two orders of magnitude speedup compared with CPU implementations.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"RP-Ring: A Heterogeneous Multi-FPGA Accelerating Solution for N-Body Simulations\",\"authors\":\"Tianqi Wang, Xi Jin, Bo Peng, Chuanjun Wang, Linlin Zheng\",\"doi\":\"10.1109/FCCM.2016.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose an heterogeneous multi-FPGA accelerating solution, which is called as RP-ring (Reconfigurable Processor ring), for direct-summation N-body simulation. In this solution, we try to use existing FPGA boards rather than design new specialized boards to reduce cost. It can be expanded conveniently with any available FPGA board and only requires quite low communication bandwidth between FPGA boards. The communication protocol is simple and can be implemented with limited hardware/software resource. In order to prevent the slowest board from dragging the overall performance down, we build a mathematical model to decompose workload among FPGAs. The model divide workload based on the logic resource, memory access bandwidth and communication bandwidth of each FPGA chip. We apply the solution in astrodynamics simulation and achieve two orders of magnitude speedup compared with CPU implementations.\",\"PeriodicalId\":113498,\"journal\":{\"name\":\"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FCCM.2016.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RP-Ring: A Heterogeneous Multi-FPGA Accelerating Solution for N-Body Simulations
We propose an heterogeneous multi-FPGA accelerating solution, which is called as RP-ring (Reconfigurable Processor ring), for direct-summation N-body simulation. In this solution, we try to use existing FPGA boards rather than design new specialized boards to reduce cost. It can be expanded conveniently with any available FPGA board and only requires quite low communication bandwidth between FPGA boards. The communication protocol is simple and can be implemented with limited hardware/software resource. In order to prevent the slowest board from dragging the overall performance down, we build a mathematical model to decompose workload among FPGAs. The model divide workload based on the logic resource, memory access bandwidth and communication bandwidth of each FPGA chip. We apply the solution in astrodynamics simulation and achieve two orders of magnitude speedup compared with CPU implementations.