{"title":"一种可重构的低功耗高性能矩阵乘法器设计","authors":"R. Lin","doi":"10.1109/ISQED.2000.838891","DOIUrl":null,"url":null,"abstract":"A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.","PeriodicalId":113766,"journal":{"name":"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"A reconfigurable low-power high-performance matrix multiplier design\",\"authors\":\"R. Lin\",\"doi\":\"10.1109/ISQED.2000.838891\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.\",\"PeriodicalId\":113766,\"journal\":{\"name\":\"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-03-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISQED.2000.838891\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE 2000 First International Symposium on Quality Electronic Design (Cat. No. PR00525)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISQED.2000.838891","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A reconfigurable low-power high-performance matrix multiplier design
A novel reconfigurable low-power high-performance matrix multiplier architecture and its component circuits are presented. The processor can be easily reconfigured to compute the product of matrices X/sub nK/ and Y/sub km/ for any integers n, k, m and any item precision b (ranging from 4 to 64 bits) thus maximizing the utilization of the hardware available. As a typical example, the hardware equivalent to one 64/spl times/64 bit high precision multiplier in the system can be directly reconfigured to produce the product of two matrices X/sub 8/spl times/8/ and Y/sub 8/spl times/8/ of 8-bit items in 9 pipeline cycles, which would require 512 multiplications (done by large multipliers) in a non-reconfigurable high precision system. Given an input stream of h/spl times/h matrix pairs with b-bit items, the processor, called matrix multiplier of size s (note s=hb), may consist of an array of (s/m)/sup 2/ of m/spl times/m small multipliers (m=4 case is illustrated), a few arrays of adders each adding three numbers, an array of accumulators and corresponding simple reconfiguration switches. To compute the product of X/sub nK/ and Y/sub km/, of item precision b on the proposed processor of size s we only need to partition X/sub nK/ and Y/sub km/ into s/b X s/b sub-matrices, reconfigure the processor according to the values of s (fixed) and b (input parameter), compute the products of submatrices, and accumulate them for the desired result in pipelined fashion. A recently proposed shift switch logic, a nonbinary logic for arithmetic circuits, is utilized in the design. The novel logic operates 4-bit state signals where no more than half of the signal bits are subject to value-change at any logic stage, which, verified by SPICE simulation, significantly reduces the large circuit power dissipation while keeping high performance in speed and small VLSI area.