Yuta Nagahara, Jiale Yan, Kazushi Kawamura, Masato Motomura, Thiem Van Chu
{"title":"将 COO 高效转换为 CSR,加速 FPGA 上的稀疏矩阵处理","authors":"Yuta Nagahara, Jiale Yan, Kazushi Kawamura, Masato Motomura, Thiem Van Chu","doi":"10.1109/ICCE59016.2024.10444348","DOIUrl":null,"url":null,"abstract":"Sparse matrix processing is an important computational kernel widely applied in various fields such as graph processing, and big data analysis. In many applications, sparse matrix processing is often a bottleneck, so various accelerators for it have been proposed. These accelerators often process sparse matrices in compressed formats like Coordinate (COO) or Compressed Sparse Row (CSR), which store only nonzero elements, to optimize memory usage. Some accelerators perform the conversion of output matrices calculated in COO format to CSR format, which enables a higher compression ratio, in order to reduce external memory traffic. Given that external memory bandwidth is the performance bottleneck in most cases, designing an efficient COO to CSR (COO2CSR) converter is an important issue that needs to be addressed. In this paper, we propose a COO2CSR conversion method that overcomes the challenge of performing the conversion in a highly parallel manner without prior knowledge of the target matrix. Based on this method, we develop a high-performance COO2CSR converter. We simulated our converter and found that it achieves near-optimal throughput. In addition, logic synthesis results on an Alveo U55C FPGA board showed that the converter consumes only 1.07% of LUTs, 0.65% of FFs, and 1.24% of BRAMs.","PeriodicalId":518694,"journal":{"name":"2024 IEEE International Conference on Consumer Electronics (ICCE)","volume":"66 4","pages":"1-2"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient COO to CSR Conversion for Accelerating Sparse Matrix Processing on FPGA\",\"authors\":\"Yuta Nagahara, Jiale Yan, Kazushi Kawamura, Masato Motomura, Thiem Van Chu\",\"doi\":\"10.1109/ICCE59016.2024.10444348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sparse matrix processing is an important computational kernel widely applied in various fields such as graph processing, and big data analysis. In many applications, sparse matrix processing is often a bottleneck, so various accelerators for it have been proposed. These accelerators often process sparse matrices in compressed formats like Coordinate (COO) or Compressed Sparse Row (CSR), which store only nonzero elements, to optimize memory usage. Some accelerators perform the conversion of output matrices calculated in COO format to CSR format, which enables a higher compression ratio, in order to reduce external memory traffic. Given that external memory bandwidth is the performance bottleneck in most cases, designing an efficient COO to CSR (COO2CSR) converter is an important issue that needs to be addressed. In this paper, we propose a COO2CSR conversion method that overcomes the challenge of performing the conversion in a highly parallel manner without prior knowledge of the target matrix. Based on this method, we develop a high-performance COO2CSR converter. We simulated our converter and found that it achieves near-optimal throughput. In addition, logic synthesis results on an Alveo U55C FPGA board showed that the converter consumes only 1.07% of LUTs, 0.65% of FFs, and 1.24% of BRAMs.\",\"PeriodicalId\":518694,\"journal\":{\"name\":\"2024 IEEE International Conference on Consumer Electronics (ICCE)\",\"volume\":\"66 4\",\"pages\":\"1-2\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2024 IEEE International Conference on Consumer Electronics (ICCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE59016.2024.10444348\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2024 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE59016.2024.10444348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient COO to CSR Conversion for Accelerating Sparse Matrix Processing on FPGA
Sparse matrix processing is an important computational kernel widely applied in various fields such as graph processing, and big data analysis. In many applications, sparse matrix processing is often a bottleneck, so various accelerators for it have been proposed. These accelerators often process sparse matrices in compressed formats like Coordinate (COO) or Compressed Sparse Row (CSR), which store only nonzero elements, to optimize memory usage. Some accelerators perform the conversion of output matrices calculated in COO format to CSR format, which enables a higher compression ratio, in order to reduce external memory traffic. Given that external memory bandwidth is the performance bottleneck in most cases, designing an efficient COO to CSR (COO2CSR) converter is an important issue that needs to be addressed. In this paper, we propose a COO2CSR conversion method that overcomes the challenge of performing the conversion in a highly parallel manner without prior knowledge of the target matrix. Based on this method, we develop a high-performance COO2CSR converter. We simulated our converter and found that it achieves near-optimal throughput. In addition, logic synthesis results on an Alveo U55C FPGA board showed that the converter consumes only 1.07% of LUTs, 0.65% of FFs, and 1.24% of BRAMs.