{"title":"Spica: Exploring FPGA Optimizations to Enable an Efficient SpMV Implementation for Computations at Edge","authors":"Dheeraj Ramchandani, Bahar Asgari, Hyesoon Kim","doi":"10.1109/EDGE60047.2023.00018","DOIUrl":null,"url":null,"abstract":"With the emergence of FPGA boards equipped with high bandwidth memory (HBM), these boards have become more attractive for implementing memory-intensive computational kernels such as sparse matrix-vector multiplication (SpMV), with a wide range of applications in edge computations from deep learning to robotics. Specialized implementation of SpMV on FPGAs enables efficient utilization of the limited resources in edge systems. High-level synthesis (HLS) compilers, on the other hand, have eased the programming of FPGAs, leading to a faster development cycle. Even though the programming of FPGAs has become easier, obtaining maximum throughput even for the straightforward kernel of SpMV still requires careful optimizations. Therefore, this paper explores the impact of deploying various optimization techniques such as temporal parallelism, spatial parallelism, and memory alignment to help SpMV fully utilize the available memory bandwidth of HBM on a Xilinx FPGA board to achieve close-to-peak throughput without wasting the resources. We conclude the optimizations by suggesting Spica, a high-throughput tree-based SpMV implementation.","PeriodicalId":369407,"journal":{"name":"2023 IEEE International Conference on Edge Computing and Communications (EDGE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Conference on Edge Computing and Communications (EDGE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EDGE60047.2023.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the emergence of FPGA boards equipped with high bandwidth memory (HBM), these boards have become more attractive for implementing memory-intensive computational kernels such as sparse matrix-vector multiplication (SpMV), with a wide range of applications in edge computations from deep learning to robotics. Specialized implementation of SpMV on FPGAs enables efficient utilization of the limited resources in edge systems. High-level synthesis (HLS) compilers, on the other hand, have eased the programming of FPGAs, leading to a faster development cycle. Even though the programming of FPGAs has become easier, obtaining maximum throughput even for the straightforward kernel of SpMV still requires careful optimizations. Therefore, this paper explores the impact of deploying various optimization techniques such as temporal parallelism, spatial parallelism, and memory alignment to help SpMV fully utilize the available memory bandwidth of HBM on a Xilinx FPGA board to achieve close-to-peak throughput without wasting the resources. We conclude the optimizations by suggesting Spica, a high-throughput tree-based SpMV implementation.