2021 IEEE Workshop on Signal Processing Systems (SiPS)最新文献

TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference TernGEMM:用于快速DNN推理的通用三元权矩阵乘法库

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00028

Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, B. Shim

{"title":"TernGEMM: GEneral Matrix Multiply Library with Ternary Weights for Fast DNN Inference","authors":"Seokhyeon Choi, Kyuhong Shim, Jungwook Choi, Wonyong Sung, B. Shim","doi":"10.1109/SiPS52927.2021.00028","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00028","url":null,"abstract":"Efficient implementation of deep neural networks on CPU-based systems is very critical because applications proliferate to embedded and Internet of Things (IoT) systems. Many CPUs for personal computers and embedded systems equip Single Instruction Multiple Data (SIMD) instructions, which can be utilized to implement an efficient GEneral Matrix Multiply (GEMM) library that is very necessary for efficient deep neural network implementation. While many deep neural networks show quite good performance even at 1-bit or 2-bit precision, the current CPU instruction and library do not efficiently support arithmetic operations below 8-bit. We propose TernGEMM, a special GEMM library using SIMD instructions for Deep Neural Network (DNN) inference with ternary weights and activations under 8-bit. TernGEMM improves the speed by replacing slow multiply-add with logical operations and also accumulating a number of multiplications without bit expansion operations. We compared the speedup of TernGEMM with tiling optimization and GEMMLowp, an 8-bit precision GEMM library. For Intel CPU, the speedup of ×2.052, ×2.973, and ×2.986 is achieved on ResNet-50, MobileNet-V2, EfficientNet-B0, respectively. For ARM CPU, TernGEMM’s speedup is ×2.143, ×1.765, and ×1.856, respectively.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115045448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Efficient Neuromorphic Signal Processing with Loihi 2 基于loihi2的高效神经形态信号处理

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00053

G. Orchard, E. P. Frady, D. B. Rubin, S. Sanborn, S. Shrestha, F. Sommer, Mike Davies

{"title":"Efficient Neuromorphic Signal Processing with Loihi 2","authors":"G. Orchard, E. P. Frady, D. B. Rubin, S. Sanborn, S. Shrestha, F. Sommer, Mike Davies","doi":"10.1109/SiPS52927.2021.00053","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00053","url":null,"abstract":"The biologically inspired spiking neurons used in neuromorphic computing are nonlinear filters with dynamic state variables—very different from the stateless neuron models used in deep learning. The next version of Intel's neuromorphic research processor, Loihi 2, supports a wide range of stateful spiking neuron models with fully programmable dynamics. Here we showcase advanced spiking neuron models that can be used to efficiently process streaming data in simulation experiments on emulated Loihi 2 hardware. In one example, Resonate-and-Fire (RF) neurons are used to compute the Short Time Fourier Transform (STFT) with similar computational complexity but 47x less output bandwidth than the conventional STFT. In another example, we describe an algorithm for optical flow estimation using spatiotemporal RF neurons that requires over 90x fewer operations than a conventional DNN-based solution. We also demonstrate promising preliminary results using backpropagation to train RF neurons for audio classification tasks. Finally, we show that a cascade of Hopf resonators—a variant of the RF neuron—replicates novel properties of the cochlea and motivates an efficient spike-based spectrogram encoder.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126686646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

A Stage-wise Conversion Strategy for Low-Latency Deformable Spiking CNN 低延迟可变形峰值CNN的分阶段转换策略

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00009

Chunyu Wang, Jiapeng Luo, Zhongfeng Wang

{"title":"A Stage-wise Conversion Strategy for Low-Latency Deformable Spiking CNN","authors":"Chunyu Wang, Jiapeng Luo, Zhongfeng Wang","doi":"10.1109/SiPS52927.2021.00009","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00009","url":null,"abstract":"Spiking neural networks (SNNs) are currently one of the most successful approaches to model the behavior and learning potential of the brain. Recently, they have obtained marvelous research interest thanks to their event-driven and energy-efficient characteristics. While difficult to directly train SNNs from scratch because of their non-differentiable spike operations, many works have focused on converting a trained DNN to the target SNN. However, there is no efficient method to convert the deformable convolutional layer which is frequently used in many applications. The deformable convolution layer enables deformation of the convolutional sampling grid by adding offsets to the regular sampling locations, which enhances the geometric transformation modeling capability of CNNs. In this work, we propose a novel deformable spiking CNN, which can successfully convert DNNs with deformable convolution layers to SNNs with much shorter simulation time and have low latency during inference while maintaining high accuracy. To be specific, we design an effective method dedicated for deformable convolution layers to be converted. By treating the offset prediction module as an embedded SNN, we calculate the spiking offsets multi times and use the average values as the final offsets for deformable convolution. We also propose a stage-wise DNN-SNN conversion strategy to further reduce the conversion error. We divide the network into several stages and convert each stage sequentially with retraining to diminish the difference between the source DNN and the target SNN as much as possible. The experiments on CIFAR-10 and CIFAR-100 datasets show that our method surpasses the state-of-the-art works both in conversion accuracy and inference latency.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"267 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133333101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Implementation of a Highly Accurate Stochastic Spiking Neural Network 高精度随机脉冲神经网络的设计与实现

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00050

Chengcheng Tang, Jie Han

{"title":"Design and Implementation of a Highly Accurate Stochastic Spiking Neural Network","authors":"Chengcheng Tang, Jie Han","doi":"10.1109/SiPS52927.2021.00050","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00050","url":null,"abstract":"The emergence of spiking neural networks (SNNs) provide a promising approach to the energy efficient design of artificial neural networks (ANNs). The rate encoded computation in SNNs utilizes the number of spikes in a time window to encode the intensity of a signal, in a similar way to the information encoding in stochastic computing. Inspired by this similarity, this paper presents a hardware design of stochastic SNNs that attains a high accuracy. A design framework is elaborated for the input, hidden and output layers. This design takes advantage of a priority encoder to convert the spikes between layers of neurons into index-based signals and uses the cumulative distribution function of the signals for spike train generation. Thus, it mitigates the problem of a relatively low information density and reduces the usage of hardware resources in SNNs. This design is implemented in field programmable gate arrays (FPGAs) and its performance is evaluated on the MNIST image recognition dataset. Hardware costs are evaluated for different sizes of hidden layers in the stochastic SNNs and the recognition accuracy is obtained using different lengths of stochastic sequences. The results show that this stochastic SNN framework achieves a higher accuracy compared to other SNN designs and a comparable accuracy as their ANN counterparts. Hence, the proposed SNN design can be an effective alternative to achieving high accuracy in hardware constrained applications.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127077346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Scalable Hardware Architecture for Invertible Logic with Sparse Hamiltonian Matrices 具有稀疏哈密顿矩阵的可逆逻辑的可扩展硬件结构

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00047

N. Onizawa, A. Tamakoshi, T. Hanyu

引用次数: 1

ComplexBeat: Breathing Rate Estimation from Complex CSI ComplexBeat:从复杂CSI中估计呼吸频率

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00046

Sitian Li, Andreas Toftegaard Kristensen, A. Burg, Alexios Balatsoukas-Stimming

引用次数: 2

Understanding the Energy vs. Adversarial Robustness Trade-Off in Deep Neural Networks 理解深度神经网络中能量与对抗鲁棒性的权衡

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00017

Kyungmi Lee, A. Chandrakasan

{"title":"Understanding the Energy vs. Adversarial Robustness Trade-Off in Deep Neural Networks","authors":"Kyungmi Lee, A. Chandrakasan","doi":"10.1109/SiPS52927.2021.00017","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00017","url":null,"abstract":"Adversarial examples, which are crafted by adding small inconspicuous perturbations to typical inputs in order to fool the prediction of a deep neural network (DNN), can pose a threat to security-critical applications, and robustness against adversarial examples is becoming an important factor for designing a DNN. In this work, we first examine the methodology for evaluating adversarial robustness that uses the first-order attack methods, and analyze three cases when this evaluation methodology overestimates robustness: 1) numerical saturation of cross-entropy loss, 2) non-differentiable functions in DNNs, and 3) ineffective initialization of the attack methods. For each case, we propose compensation methods that can be easily combined with the existing attack methods, thus provide a more precise evaluation methodology for robustness. Second, we benchmark the relationship between adversarial robustness and inference-time energy at an embedded hardware platform using our proposed evaluation methodology, and demonstrate that this relationship can be obscured by the three cases behind overestimation. Overall, our work shows that the robustness-energy trade-off has differences from the conventional accuracy-energy trade-off, and highlights importance of the precise evaluation methodology for robustness.","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134300868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Design and Implementation of Autoencoder-LSTM Accelerator for Edge Outlier Detection 边缘异常点检测自编码器- lstm加速器的设计与实现

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00032

Nadya A. Mohamed, Joseph R. Cavallaro

引用次数: 2

OneAI - Novel Multipurpose Deep Learning Algorithms for UWB Wireless Networks 一种新的UWB无线网络多用途深度学习算法

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00031

A. Abbasi, Huaping Liu

{"title":"OneAI - Novel Multipurpose Deep Learning Algorithms for UWB Wireless Networks","authors":"A. Abbasi, Huaping Liu","doi":"10.1109/SiPS52927.2021.00031","DOIUrl":"https://doi.org/10.1109/SiPS52927.2021.00031","url":null,"abstract":"In this paper, novel multipurpose deep learning algorithms are proposed for ultra-wideband (UWB) wireless networks that are capable of identifying the channel environment, estimating the SNR level, and performing ToA estimation, simultaneously. UWB technology is among the rapid-growing solutions for the next generation of deep learning-based wireless communication and localization systems. Existing deep learning algorithms for UWB wireless networks have addressed the various signal processing tasks individually in separate deep learning modules. This, however, increases the computational complexity, power consumption, and overall latency of the models. In this paper, unlike the existing methods, the desired signal processing tasks are performed in one single deep learning module. The proposed model consists of a main deep learning module as the core of the model that extracts low-level information from the signal and several shallow learning networks to extract high-level information. We demonstrate that the low-level information that is extracted in the core deep learning module can be reused in all separate tasks. The performance of the proposed models is investigated against the standard IEEE 802.15.4a channel model by evaluating various metrics such as accuracy, area under the curve (AUC), precision, mean absolute error (MAE), and mean square error (MSE).","PeriodicalId":103894,"journal":{"name":"2021 IEEE Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129967532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Compressive Estimation of Wideband mmW Channel using Analog True-Time-Delay Array 基于模拟真时延阵列的宽带毫米波信道压缩估计

2021 IEEE Workshop on Signal Processing Systems (SiPS) Pub Date : 2021-10-01 DOI: 10.1109/SiPS52927.2021.00038

Veljko Boljanovic, D. Cabric

引用次数: 3