一种可重构的低功耗高性能矩阵乘法器设计

Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525) Pub Date : 2000-03-20 DOI:10.1109/ISQED.2000.838891

R. Lin

{"title":"一种可重构的低功耗高性能矩阵乘法器设计","authors":"R. Lin","doi":"10.1109/ISQED.2000.838891","DOIUrl":null,"url":null,"abstract":"A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.","PeriodicalId":113766,"journal":{"name":"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"A reconfigurable low-power high-performance matrix multiplier design\",\"authors\":\"R. Lin\",\"doi\":\"10.1109/ISQED.2000.838891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.\",\"PeriodicalId\":113766,\"journal\":{\"name\":\"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED.2000.838891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2000.838891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

摘要

提出了一种新的可重构低功耗高性能矩阵乘法器结构及其组成电路。处理器可以很容易地重新配置，以计算矩阵X/sub nK/和Y/sub km/对任何整数n, k, m和任何项目精度b(从4到64位)的乘积，从而最大限度地利用可用的硬件。作为一个典型的例子，系统中相当于一个64/spl倍/64位高精度乘法器的硬件可以直接重新配置，以在9个管道周期中产生两个矩阵X/sub 8/spl倍/8/和Y/sub 8/spl倍/8/的乘积，这将需要512次乘法(由大型乘法器完成)在不可重构的高精度系统中。给定一个带有b位项的h/spl次/h矩阵对的输入流，称为大小为s的矩阵乘法器(注s=hb)，可以由一个(s/m)/sup 2/个m/spl次/m个小乘法器(说明m=4的情况)、几个加法器数组(每个加3个数字)、一个累加器数组和相应的简单重构开关组成。为了计算项目精度为b的X/sub nK/和Y/sub km/在大小为s的处理器上的乘积，我们只需要将X/sub nK/和Y/sub km/划分为s/b X s/b子矩阵，根据s(固定)和b(输入参数)的值重新配置处理器，计算子矩阵的乘积，并以流水线方式累积它们以获得期望的结果。在设计中采用了一种最近提出的移位开关逻辑，一种用于算术电路的非二进制逻辑。该新型逻辑处理4位状态信号，在任何逻辑阶段都不超过一半的信号位受到值变化的影响，SPICE仿真验证了这一点，在保持高性能速度和小VLSI面积的同时，显着降低了电路的大功耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A reconfigurable low-power high-performance matrix multiplier design

A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)

自引率

0.00%

发文量