Wenhui Ou, Zhuoyu Wu, Z. Wang, Chao Chen, Yongkui Yang
{"title":"COMPACT: Co-processor for Multi-mode Precision-adjustable Non-linear Activation Functions","authors":"Wenhui Ou, Zhuoyu Wu, Z. Wang, Chao Chen, Yongkui Yang","doi":"10.23919/DATE56975.2023.10137019","DOIUrl":null,"url":null,"abstract":"Non-linear activation functions imitating neuron behaviors are ubiquitous in machine learning algorithms for time series signals while also demonstrating significant gain in precision for conventional vision-based deep learning networks. State-of-the-art implementation of such functions on GPU-like devices incurs a large physical cost, whereas edge devices adopt either linear interpolation or simplified linear functions leading to degraded precision. In this work, we design COMPACT, a co-processor with adjustable precision for multiple non-linear activation functions including but not limited to exponent, sigmoid, tangent, logarithm, and mish. Benchmarking with state-of-the-arts, COMPACT achieves a 26% reduction in the absolute error on a 1.6x widen approximation range taking advantage of the triple decomposition technique inspired by Hajduk's formula of Padé approximation. A SIMD-ISA-based vector co-processor has been implemented on FPGA which leads to a 30% reduction in execution latency but the area overhead nearly remains the same with related designs. Furthermore, COMPACT is adjustable to 46% latency improvement when the maximum absolute error is tolerant to the order of 1E-3.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE56975.2023.10137019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Non-linear activation functions imitating neuron behaviors are ubiquitous in machine learning algorithms for time series signals while also demonstrating significant gain in precision for conventional vision-based deep learning networks. State-of-the-art implementation of such functions on GPU-like devices incurs a large physical cost, whereas edge devices adopt either linear interpolation or simplified linear functions leading to degraded precision. In this work, we design COMPACT, a co-processor with adjustable precision for multiple non-linear activation functions including but not limited to exponent, sigmoid, tangent, logarithm, and mish. Benchmarking with state-of-the-arts, COMPACT achieves a 26% reduction in the absolute error on a 1.6x widen approximation range taking advantage of the triple decomposition technique inspired by Hajduk's formula of Padé approximation. A SIMD-ISA-based vector co-processor has been implemented on FPGA which leads to a 30% reduction in execution latency but the area overhead nearly remains the same with related designs. Furthermore, COMPACT is adjustable to 46% latency improvement when the maximum absolute error is tolerant to the order of 1E-3.