Softmax实现的设计空间探索

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI:10.1109/ASAP49362.2020.00017

Zhigang Wei, Aman Arora, P. Patel, L. John

{"title":"Softmax实现的设计空间探索","authors":"Zhigang Wei, Aman Arora, P. Patel, L. John","doi":"10.1109/ASAP49362.2020.00017","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNN) are crucial components of machine learning in the big data era. Significant effort has been put into the hardware acceleration of convolution and fully-connected layers of neural networks, while not too much attention has been put on the Softmax layer. Softmax is used in terminal classification layers in networks like ResNet, and is also used in intermediate layers in networks like the Transformer. As the speed for other DNN layers keeps improving, efficient and flexible designs for Softmax are required. With the existence of several ways to implement Softmax in hardware, we evaluate various softmax hardware designs and the trade-offs between them. In order to make the design space exploration more efficient, we also develop a parameterized generator which can produce softmax designs by varying multiple aspects of a base architecture. The aspects or knobs are parallelism, accuracy, storage and precision. The goal of the generator is to enable evaluation of tradeoffs between area, delay, power and accuracy in the architecture of a softmax unit. We simulate and synthesize the generated designs and present results comparing them with the existing state-of-the-art. Our exploration reveals that the design with parallelism of 16 can provide the best area-delay product among designs with parallelism ranging from 1 to 32. It is also observed that look-up table based approximate LOG and EXP units can be used to yield almost the same accuracy as the full LOG and EXP units, while providing area and energy benefits. Additionally, providing local registers for intermediate values is seen to provide energy savings.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Design Space Exploration for Softmax Implementations\",\"authors\":\"Zhigang Wei, Aman Arora, P. Patel, L. John\",\"doi\":\"10.1109/ASAP49362.2020.00017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Neural Networks (DNN) are crucial components of machine learning in the big data era. Significant effort has been put into the hardware acceleration of convolution and fully-connected layers of neural networks, while not too much attention has been put on the Softmax layer. Softmax is used in terminal classification layers in networks like ResNet, and is also used in intermediate layers in networks like the Transformer. As the speed for other DNN layers keeps improving, efficient and flexible designs for Softmax are required. With the existence of several ways to implement Softmax in hardware, we evaluate various softmax hardware designs and the trade-offs between them. In order to make the design space exploration more efficient, we also develop a parameterized generator which can produce softmax designs by varying multiple aspects of a base architecture. The aspects or knobs are parallelism, accuracy, storage and precision. The goal of the generator is to enable evaluation of tradeoffs between area, delay, power and accuracy in the architecture of a softmax unit. We simulate and synthesize the generated designs and present results comparing them with the existing state-of-the-art. Our exploration reveals that the design with parallelism of 16 can provide the best area-delay product among designs with parallelism ranging from 1 to 32. It is also observed that look-up table based approximate LOG and EXP units can be used to yield almost the same accuracy as the full LOG and EXP units, while providing area and energy benefits. Additionally, providing local registers for intermediate values is seen to provide energy savings.\",\"PeriodicalId\":375691,\"journal\":{\"name\":\"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP49362.2020.00017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP49362.2020.00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

深度神经网络(DNN)是大数据时代机器学习的重要组成部分。在卷积和神经网络全连接层的硬件加速方面已经投入了大量的精力，而在Softmax层上却没有得到太多的关注。Softmax用于ResNet等网络的终端分类层，也用于Transformer等网络的中间层。随着其他DNN层的速度不断提高，需要对Softmax进行高效灵活的设计。由于Softmax存在多种硬件实现方式，我们评估了各种Softmax硬件设计以及它们之间的权衡。为了使设计空间探索更有效，我们还开发了一个参数化生成器，它可以通过改变基础架构的多个方面来产生softmax设计。这四个方面分别是并行性、准确性、存储量和精度。该发生器的目标是能够在softmax单元的架构中评估面积、延迟、功率和精度之间的权衡。我们模拟和综合生成的设计，并将结果与现有的最先进的设计进行比较。我们的研究表明，在并行度为1到32的设计中，并行度为16的设计可以提供最好的面积延迟产品。还可以观察到，基于近似LOG和EXP单位的查找表可以用于产生几乎与完整LOG和EXP单位相同的精度，同时提供面积和能量优势。此外，为中间值提供本地寄存器被认为可以节省能源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design Space Exploration for Softmax Implementations

Deep Neural Networks (DNN) are crucial components of machine learning in the big data era. Significant effort has been put into the hardware acceleration of convolution and fully-connected layers of neural networks, while not too much attention has been put on the Softmax layer. Softmax is used in terminal classification layers in networks like ResNet, and is also used in intermediate layers in networks like the Transformer. As the speed for other DNN layers keeps improving, efficient and flexible designs for Softmax are required. With the existence of several ways to implement Softmax in hardware, we evaluate various softmax hardware designs and the trade-offs between them. In order to make the design space exploration more efficient, we also develop a parameterized generator which can produce softmax designs by varying multiple aspects of a base architecture. The aspects or knobs are parallelism, accuracy, storage and precision. The goal of the generator is to enable evaluation of tradeoffs between area, delay, power and accuracy in the architecture of a softmax unit. We simulate and synthesize the generated designs and present results comparing them with the existing state-of-the-art. Our exploration reveals that the design with parallelism of 16 can provide the best area-delay product among designs with parallelism ranging from 1 to 32. It is also observed that look-up table based approximate LOG and EXP units can be used to yield almost the same accuracy as the full LOG and EXP units, while providing area and energy benefits. Additionally, providing local registers for intermediate values is seen to provide energy savings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)

自引率

0.00%

发文量