基于流水线矩阵乘法加速设计和非均匀量化的深度学习推理方案

2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS) Pub Date : 2021-10-10 DOI:10.1109/CCIS53392.2021.9754668

Yuyang Zhang, Dik Hin Leung, Min Guo, Yijia Xiao, Haoyue Liu, Yunfei Li, Jiyuan Zhang, Guan Wang, Zhen Chen

{"title":"基于流水线矩阵乘法加速设计和非均匀量化的深度学习推理方案","authors":"Yuyang Zhang, Dik Hin Leung, Min Guo, Yijia Xiao, Haoyue Liu, Yunfei Li, Jiyuan Zhang, Guan Wang, Zhen Chen","doi":"10.1109/CCIS53392.2021.9754668","DOIUrl":null,"url":null,"abstract":"Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.","PeriodicalId":191226,"journal":{"name":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization\",\"authors\":\"Yuyang Zhang, Dik Hin Leung, Min Guo, Yijia Xiao, Haoyue Liu, Yunfei Li, Jiyuan Zhang, Guan Wang, Zhen Chen\",\"doi\":\"10.1109/CCIS53392.2021.9754668\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.\",\"PeriodicalId\":191226,\"journal\":{\"name\":\"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCIS53392.2021.9754668\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS53392.2021.9754668","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

矩阵乘法是深度学习推理应用的基础。当涉及到边缘计算设备上的硬件加速时，矩阵乘法通常会占用大部分时间。为了在边缘计算中获得更好的性能，我们引入了一种基于流水线矩阵乘法方案和非均匀量化方法的低功耗多层感知器(MLP)加速器。该实现在现场可编程门阵列(FPGA)器件上运行，并测试了其在手写数字分类和Q-learning任务上的性能。结果表明，该方法可以在较低的功耗下获得较好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Deep Learning Inference Scheme Based on Pipelined Matrix Multiplication Acceleration Design and Non-uniform Quantization

Matrix multiplication is the bedrock in Deep Learning inference application. When it comes to hardware acceleration on edge computing devices, matrix multiplication often takes up a great majority of the time. To achieve better performance in edge computing, we introduce a low-power Multi-layer Perceptron (MLP) accelerator based on a pipelined matrix multiplication scheme and a nonuniform quantization methodology. The implementation is running on Field-programmable Gate Array (FPGA) devices and tested its performance on handwritten digit classification and Q-learning tasks. Results show that our method can achieve better performance with fewer power consumption.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS)

自引率

0.00%

发文量