2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

A “New Ara” for Vector Computing: An Open Source Highly Efficient RISC-V V 1.0 Vector Processor Design 矢量计算的“新Ara”:开源高效RISC-V V 1.0矢量处理器设计

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00017

Matteo Perotti, Matheus A. Cavalcante, Nils Wistoff, Renzo Andri, L. Cavigelli, L. Benini

引用次数: 14

Mask-Net: A Hardware-efficient Object Detection Network with Masked Region Proposals Mask-Net:一种具有屏蔽区域的硬件高效目标检测网络

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00030

Han-Chen Chen, Cong Hao

{"title":"Mask-Net: A Hardware-efficient Object Detection Network with Masked Region Proposals","authors":"Han-Chen Chen, Cong Hao","doi":"10.1109/ASAP54787.2022.00030","DOIUrl":"https://doi.org/10.1109/ASAP54787.2022.00030","url":null,"abstract":"Object detection on embedded systems is challenging because it is hard to achieve real-time inference with low energy consumption and limited hardware resources. Another challenge is to find hardware-friendly methods to avoid redundant computation. To address these challenges, in this work, we propose Mask-Net, a hardware-efficient object detection network with masked region proposals in regular shapes. First, we propose a hardware-friendly region proposal method to avoid redundant computation as much as possible and as early as possible, with slight or no accuracy loss. Second, we demonstrate that our method is generalizable by applying it to several detection backbones including SkyNet, ResNet-18 and UltraNet. Our method performs well in different scenarios, including DAC-SDC dataset, UAV123 dataset and OTB100 dataset. We choose SkyNet as our base model to design an accelerator and verify our design on Xilinx ZCU106 FPGA. We observe a speedup of 1.3× and about 30% energy consumption reduction when the FPGA runs at different frequencies from 124 MHz to 214 MHz with only a slight accuracy loss. We also conduct a design space exploration and demonstrate that our accelerator can achieve a theoretical speedup of 1.76× with masked region proposals. This is achieved by optimally allocating DSPs to different parts of the accelerator to balance the computations before and after the mask.","PeriodicalId":207871,"journal":{"name":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116932195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

IMEC: A Memory-Efficient Convolution Algorithm For Quantised Neural Network Accelerators 一种用于量化神经网络加速器的高效内存卷积算法

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00027

Eashan Wadhwa, Shashwat Khandelwal, Shanker Shreejith

{"title":"IMEC: A Memory-Efficient Convolution Algorithm For Quantised Neural Network Accelerators","authors":"Eashan Wadhwa, Shashwat Khandelwal, Shanker Shreejith","doi":"10.1109/ASAP54787.2022.00027","DOIUrl":"https://doi.org/10.1109/ASAP54787.2022.00027","url":null,"abstract":"Quantised convolution neural networks (QCNNs) on FPGAs have shown tremendous potential for deploying deep learning on resource constrained devices closer to the data source or in embedded applications. An essential building block of (Q)CNNs are the convolutional layers. FPGA implementations use modified versions of convolution kernels to reduce the resource overheads using variations of the sliding kernel algorithm. While these alleviate resource consumption to a certain degree, they still incur considerable (distributed) memory resources, requiring the use of larger FPGA devices with sufficient on-chip memory elements to implement deep QCNNs. In this paper, we present the Inverse Memory Efficient Convolution (IMEC) algorithm, a novel strategy to lower the memory consumption of convolutional layers in QCNNs. IMEC lowers the footprint of intermediate matrix buffers incurred within the convolutional layers and the multiply-accumulate (MAC) operators required at each layer through a series of data organisation and computational optimisations. We evaluate IMEC by integrating it into the BNN-PYNQ framework that can compile high-level QCNN representations to the FPGA bitstream. Our results show that IMEC can optimise memory footprint and the overall resource overhead of the convolutional layers by ~33% and ~20% (LUT and FF count) respectively, across multiple quantisation levels (1-bit to 8-bit), while maintaining identical inference accuracy as the state-of-the-art QCNN implementations.","PeriodicalId":207871,"journal":{"name":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122361073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

High-Performance AKAZE Implementation Including Parametrizable and Generic HLS Modules 高性能AKAZE实现，包括可参数化和通用HLS模块

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00031

Matthias Nickel, Lester Kalms, Tim Häring, D. Göhringer

{"title":"High-Performance AKAZE Implementation Including Parametrizable and Generic HLS Modules","authors":"Matthias Nickel, Lester Kalms, Tim Häring, D. Göhringer","doi":"10.1109/ASAP54787.2022.00031","DOIUrl":"https://doi.org/10.1109/ASAP54787.2022.00031","url":null,"abstract":"The amount of image data to be processed has increased tremendously over the last decades. One major computer vision task is the extraction of information to find patterns in and between images. One well-studied pattern recognition algorithm is AKAZE which builds a nonlinear scale space to detect features. While being more efficient compared to its predecessor KAZE, the computational demands of AKAZE are still high. Since many real-world computer vision applications require fast computations, sometimes under hard power and time constraints, FPGAs became a focus as a suitable target platform. This work presents a highly modularized and parameterizable implementation of the AKAZE feature detection algorithm integrated into HiFlipVX, which is a High-Level Synthesis library based on the OpenVX standard. The fine granular modularization and the generic design of the implemented functions allows them to be easily reused, increasing the workflow for other computer vision algorithms. The high degree of parameterization and extension of the library enables also a fast and extensive exploration of the design space. The proposed design achieved a high repeatability and frame rate of up to 480 frames per second for an image resolution of 1920×1080 compared to related work.","PeriodicalId":207871,"journal":{"name":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128253849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Special Session on European Acceleration Technologies 欧洲加速技术特别会议

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/asap54787.2022.00011

引用次数: 0

Low-precision logarithmic arithmetic for neural network accelerators 神经网络加速器的低精度对数算法

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00021

Maxime Christ, F. D. Dinechin, F. Pétrot

{"title":"Low-precision logarithmic arithmetic for neural network accelerators","authors":"Maxime Christ, F. D. Dinechin, F. Pétrot","doi":"10.1109/ASAP54787.2022.00021","DOIUrl":"https://doi.org/10.1109/ASAP54787.2022.00021","url":null,"abstract":"Resource requirements for hardware acceleration of neural networks inference is notoriously high, both in terms of computation and storage. One way to mitigate this issue is to quantize parameters and activations. This is usually done by scaling and centering the distributions of weights and activations, on a kernel per kernel basis, so that a low-precision binary integer representation can be used. This work studies low-precision logarithmic number system (LNS) as an efficient alternative. Firstly, LNS has more dynamic than fixed-point for the same number of bits. Thus, when quantizing MNIST and CIFAR reference networks without retraining, the smallest format size achieving top-1 accuracy comparable to floating-point is 1 to 3 bits smaller with LNS than with fixed-point. In addition, it is shown that the zero bit of classical LNS is not needed in this context, and that the sign bit can be saved for activations. The proposed LNS neuron is detailed and its implementation on FPGA is shown to be smaller and faster than a fixed-point one for comparable accuracy. Secondly, low-precision LNS enables efficient inference architectures where 1 / multiplications reduce to additions; 2/ the weighted inputs are converted to classical linear domain, but the tables needed for this conversion remain very small thanks to the low precision; and 3/ the conversion of the output activation back to LNS can be merged with an arbitrary activation function.","PeriodicalId":207871,"journal":{"name":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125695746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fast Heterogeneous Task Mapping for Reducing Edge DNN Latency 减少边缘DNN延迟的快速异构任务映射

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00020

Murray L. Kornelsen, S. H. Mozafari, J. Clark, B. Meyer, W. Gross

引用次数: 0

Aggressive Performance Improvement on Processing-in-Memory Devices by Adopting Hugepages 采用大页面技术对内存中处理设备的性能进行积极改进

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00019

P. C. Santos, Bruno E. Forlin, M. Alves, L. Carro

引用次数: 0

Design Space Exploration for Memory-Oriented Approximate Computing Techniques 面向内存的近似计算技术的设计空间探索

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00028

Hugo Miomandre, J. Nezan, D. Ménard

引用次数: 0

Secure Communication Protocol for Network-on-Chip with Authenticated Encryption and Recovery Mechanism 带认证加密和恢复机制的片上网络安全通信协议

2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2022-07-01 DOI: 10.1109/ASAP54787.2022.00033

Julian Haase, Sebastian Jaster, Elke Franz, D. Göhringer

{"title":"Secure Communication Protocol for Network-on-Chip with Authenticated Encryption and Recovery Mechanism","authors":"Julian Haase, Sebastian Jaster, Elke Franz, D. Göhringer","doi":"10.1109/ASAP54787.2022.00033","DOIUrl":"https://doi.org/10.1109/ASAP54787.2022.00033","url":null,"abstract":"In recent times, Network-on-Chip (NoC) has become state of the art for communication in Multiprocessor System-on-Chip due to the existing scalability issues in this area. However, these systems are exposed to security threats such as extraction of secret information. Therefore, the need for secure communication arises in such environments. In this work, we present a communication protocol based on authenticated encryption with recovery mechanisms to establish secure end-to-end communication between the NoC nodes. In addition, a selected key agreement approach required for secure communication is implemented. The security functionality is located in the network adapter of each processing element. If data is tampered with or deleted during transmission, recovery mechanisms ensure that the corrupted data is retransmitted by the network adapter without the need of interference from the processing element. We simulated and implemented the complete system with SystemC TLM using the NoC simulation platform PANACA. Our results show that we can keep a high rate of correctly transmitted information even when attackers infiltrated the NoC system.","PeriodicalId":207871,"journal":{"name":"2022 IEEE 33rd International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116897229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2