Yuran Qiao, Junzhong Shen, Tao Xiao, Qianming Yang
{"title":"一种负载敏感的动态缩放矩阵乘法器结构","authors":"Yuran Qiao, Junzhong Shen, Tao Xiao, Qianming Yang","doi":"10.1109/CICN.2016.113","DOIUrl":null,"url":null,"abstract":"Matrix multiplication is one of the most widely used computational kernels in scientific computing and machine learning. Using dedicated circuit for matrix multiplication can reduce the computational time and energy consumption. Traditional matrix multipliers always adopt linear array architecture, which works inefficiently when the size of matrix sub-block is much smaller than the array length. Using short array structure can improve the computational efficiency at the cost of occupying more memory bandwidth. In this paper, we present a workload sensitive dynamic scaling matrix multiplier structure, which can dynamically adjust the array length according to the matrix size. We build a prototype system on a Xilinx Zynq XC7Z045 FPGA. The result shows that compared with a fixed array architecture our design achieves much better performance and needs less memory bandwidth.","PeriodicalId":189849,"journal":{"name":"2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Workload Sensitive Dynamic Scaling Matrix Multiplier Structure\",\"authors\":\"Yuran Qiao, Junzhong Shen, Tao Xiao, Qianming Yang\",\"doi\":\"10.1109/CICN.2016.113\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix multiplication is one of the most widely used computational kernels in scientific computing and machine learning. Using dedicated circuit for matrix multiplication can reduce the computational time and energy consumption. Traditional matrix multipliers always adopt linear array architecture, which works inefficiently when the size of matrix sub-block is much smaller than the array length. Using short array structure can improve the computational efficiency at the cost of occupying more memory bandwidth. In this paper, we present a workload sensitive dynamic scaling matrix multiplier structure, which can dynamically adjust the array length according to the matrix size. We build a prototype system on a Xilinx Zynq XC7Z045 FPGA. The result shows that compared with a fixed array architecture our design achieves much better performance and needs less memory bandwidth.\",\"PeriodicalId\":189849,\"journal\":{\"name\":\"2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICN.2016.113\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 8th International Conference on Computational Intelligence and Communication Networks (CICN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICN.2016.113","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Workload Sensitive Dynamic Scaling Matrix Multiplier Structure
Matrix multiplication is one of the most widely used computational kernels in scientific computing and machine learning. Using dedicated circuit for matrix multiplication can reduce the computational time and energy consumption. Traditional matrix multipliers always adopt linear array architecture, which works inefficiently when the size of matrix sub-block is much smaller than the array length. Using short array structure can improve the computational efficiency at the cost of occupying more memory bandwidth. In this paper, we present a workload sensitive dynamic scaling matrix multiplier structure, which can dynamically adjust the array length according to the matrix size. We build a prototype system on a Xilinx Zynq XC7Z045 FPGA. The result shows that compared with a fixed array architecture our design achieves much better performance and needs less memory bandwidth.