Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs

2003 IEEE International Workshop on Computer Architectures for Machine Perception Pub Date : 2003-05-12 DOI:10.1109/CAMP.2003.1598160

S. Belkacemi, K. Benkrid, D. Crookes, A. Benkrid

{"title":"Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs","authors":"S. Belkacemi, K. Benkrid, D. Crookes, A. Benkrid","doi":"10.1109/CAMP.2003.1598160","DOIUrl":null,"url":null,"abstract":"Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAMP.2003.1598160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second

查看原文本刊更多论文

Xilinx Virtex fpga高性能矩阵乘法器核心的设计与实现

矩阵乘法是数字信号处理运算中的核心运算，在图像处理、计算机图形学、声纳处理和机器人等领域有着广泛的应用。本文介绍了一种高性能、全并行矩阵乘法核心的设计与实现。核心是参数化的，可以根据矩阵的维度(行数和列数)和输入数据字长进行扩展。在不到1秒的时间内，从矩阵乘法运算的高级描述中，以EDIF网络列表的形式自动生成完整的FPGA配置。这些配置是专门针对赛灵思Virtex FPGA芯片进行优化的。通过利用Xilinx Virtex fpga丰富的逻辑资源(查找表、快速进位逻辑、移位寄存器、触发器等)，实现了矩阵乘法器核心的完全并行实现;每个时钟周期生成一个完整的矩阵结果。一个3times3矩阵乘法器实例消耗2448个Virtex切片，可以在XCV1000E-6 Virtex- e芯片上以175 MHz的频率运行，从而执行超过47亿MAC/秒。这导致每秒产生1.75亿个完整的3times3矩阵结果

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2003 IEEE International Workshop on Computer Architectures for Machine Perception

自引率

0.00%

发文量