Jianhua Gao;Zhi Zhou;Xingze Huang;Juan Wang;Yizhuo Wang;Weixing Ji
{"title":"PTPS: Precision-Aware Task Partitioning and Scheduling for SpMV on CPU-FPGA Heterogeneous Platforms","authors":"Jianhua Gao;Zhi Zhou;Xingze Huang;Juan Wang;Yizhuo Wang;Weixing Ji","doi":"10.1109/TCAD.2025.3554144","DOIUrl":null,"url":null,"abstract":"The CPU-FPGA heterogeneous computing architecture is extensively employed in the embedded domain due to its low cost and power efficiency, with numerous sparse matrix-vector multiplication (SpMV) acceleration efforts already targeting this architecture. However, existing work rarely includes collaborative SpMV computations between CPU and FPGA, which limits the exploration of hybrid architectures that could potentially offer enhanced performance and flexibility. This article introduces an FPGA architecture design that supports multiprecision SpMV computations, including FP16, FP32, and FP64. Building on this, PTPS, a precision-aware SpMV task partitioning and dynamic scheduling algorithm tailored for the CPU-FPGA heterogeneous architecture, is proposed. The core idea of PTPS is lossless partitioning of sparse matrices across multiple precisions, prioritizing low-precision SpMV computations on the FPGA and high-precision computations on the CPU. PTPS not only leverages the strengths of CPU and FPGA for collaborative SpMV computations but also reduces data transmission overhead between them, thereby improving the overall computational efficiency. Experimental evaluation demonstrates that the proposed approach offers an average speedup of <inline-formula> <tex-math>$1.57\\times $ </tex-math></inline-formula> over the CPU-only approach and <inline-formula> <tex-math>$2.58\\times $ </tex-math></inline-formula> over the FPGA-only approach.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 10","pages":"3804-3815"},"PeriodicalIF":2.9000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10937993/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The CPU-FPGA heterogeneous computing architecture is extensively employed in the embedded domain due to its low cost and power efficiency, with numerous sparse matrix-vector multiplication (SpMV) acceleration efforts already targeting this architecture. However, existing work rarely includes collaborative SpMV computations between CPU and FPGA, which limits the exploration of hybrid architectures that could potentially offer enhanced performance and flexibility. This article introduces an FPGA architecture design that supports multiprecision SpMV computations, including FP16, FP32, and FP64. Building on this, PTPS, a precision-aware SpMV task partitioning and dynamic scheduling algorithm tailored for the CPU-FPGA heterogeneous architecture, is proposed. The core idea of PTPS is lossless partitioning of sparse matrices across multiple precisions, prioritizing low-precision SpMV computations on the FPGA and high-precision computations on the CPU. PTPS not only leverages the strengths of CPU and FPGA for collaborative SpMV computations but also reduces data transmission overhead between them, thereby improving the overall computational efficiency. Experimental evaluation demonstrates that the proposed approach offers an average speedup of $1.57\times $ over the CPU-only approach and $2.58\times $ over the FPGA-only approach.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.