2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

ASAP 2020 TOC

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/asap49362.2020.00004

引用次数: 0

FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort 数据中心中的fpga:并行混合超标量字符串样本排序的情况

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00031

Mikhail Asiatici, Damian Maiorano, P. Ienne

{"title":"FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort","authors":"Mikhail Asiatici, Damian Maiorano, P. Ienne","doi":"10.1109/ASAP49362.2020.00031","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00031","url":null,"abstract":"String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorters on FPGA have been proposed yet. In this paper, we present Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU. Our pHS5 is based on the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, pS5, which we extended with multiple processing elements (PEs) on the FPGA. Each PE accelerates one instance of the most effectively parallelizable dominant kernel of pS5 by up to 33% compared to a single Intel Xeon Broadwell core running at 3.4 GHz. Furthermore, we extended the job scheduling mechanism of pS5 to enable our PEs to compete with the CPU cores for processing the accelerable kernel, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU. We accelerate the whole algorithm by up to 10% compared to the 28 thread software baseline running on the 14-core Xeon processor and by up to 36% at lower thread counts.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115188810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Persistent Fault Analysis of Neural Networks on FPGA-based Acceleration System 基于fpga加速系统的神经网络持续故障分析

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00024

Dawen Xu, Ziyang Zhu, Cheng Liu, Ying Wang, Huawei Li, Lei Zhang, K. Cheng

{"title":"Persistent Fault Analysis of Neural Networks on FPGA-based Acceleration System","authors":"Dawen Xu, Ziyang Zhu, Cheng Liu, Ying Wang, Huawei Li, Lei Zhang, K. Cheng","doi":"10.1109/ASAP49362.2020.00024","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00024","url":null,"abstract":"The increasing hardware failures caused by the shrinking semiconductor technologies pose substantial influence on the neural accelerators and improving the resilience of the neural network execution becomes a great design challenge especially to mission-critical applications such as self-driving and medical diagnose. The reliability analysis of the neural network execution is a key step to understand the influence of the hardware failures, and thus is highly demanded. Prior works typically conducted the fault analysis of neural network accelerators with simulation and concentrated on the prediction accuracy loss of the models. There is still a lack of systematic fault analysis of the neural network acceleration system that considers both the accuracy degradation and system exceptions such as system stall and early termination.In this work, we implemented a representative neural network accelerator and fault injection modules on a Xilinx ARM-FPGA platform and conducted fault analysis of the system using four typical neural network models. We had the system open-sourced on github. With comprehensive experiments, we identify the system exceptions based on the various abnormal behaviours of the FPGA-based neural network acceleration system and analyze the underlying reasons. Particularly, we find that the probability of the system exceptions dominates the reliability of the system and they are mainly caused by faults in the DMA, control unit and instruction memory of the accelerators. In addition, faults in these components also incur moderate accuracy degradation of the neural network models other than the system exceptions. Thus, these components are the most fragile part of the accelerators and need to be hardened for reliable neural network execution.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114629896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Anytime Floating-Point Addition and Multiplication-Concepts and Implementations 任意时间浮点加法和乘法——概念和实现

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00034

Marcel Brand, Michael Witterauf, A. Bosio, J. Teich

引用次数: 2

Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic 可变精度位置算法的可重构流张量单元

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00033

Nuno Neves, P. Tomás, N. Roma

{"title":"Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic","authors":"Nuno Neves, P. Tomás, N. Roma","doi":"10.1109/ASAP49362.2020.00033","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00033","url":null,"abstract":"The increased adoption of DNN applications drove the emergence of dedicated tensor computing units to accelerate multi-dimensional matrix multiplication operations. Although they deploy highly efficient computing architectures, they often lack support for more general-purpose application domains. Such a limitation occurs both due to their consolidated computation scheme (restricted to matrix multiplication) and due to their frequent adoption of low-precision/custom floating-point formats (unsuited for general application domains). In contrast, this paper proposes a new Reconfigurable Tensor Unit (RTU) which deploys an array of variable-precision Vector MultiplyAccumulate (VMA) units. Furthermore, each VMA unit leverages the new Posit floating-point format and supports the full range of standardized posit precisions in a single SIMD unit, with variable vector-element width. Moreover, the proposed RTU explores the Posit format features for fused operations, together with spatial and time-multiplexing reconfiguration mechanisms to fuse and combine multiple VMAs to map high-level and complex operations. The RTU is also supported by an automatic data streaming infrastructure and a pipelined data movement scheme, allowing it to accelerate the computation of most data-parallel patterns commonly present in vectorizable applications. The proposed RTU showed to outperform state-of-the-art tensor and SIMD units, present in off-the-shelf platforms, in turn resulting in significant energy-efficiency improvements.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126531987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators 阵列感知训练/剪枝:基于阵列的神经网络加速器有效前向传播方法

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00016

Krishna Teja Chitty-Venkata, Arun Kumar Somani

{"title":"Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators","authors":"Krishna Teja Chitty-Venkata, Arun Kumar Somani","doi":"10.1109/ASAP49362.2020.00016","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00016","url":null,"abstract":"Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131713230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Training Neural Nets using only an Approximate Tableless LNS ALU 仅使用近似无表LNS ALU训练神经网络

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00020

M. Arnold, E. Chester, Corey Johnson

引用次数: 4

[ASAP 2020 Title page] [ASAP 2020首页]

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/asap49362.2020.00002

引用次数: 0

External Referees - ASAP 2020 外部裁判-尽快2020

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/asap49362.2020.00009

引用次数: 0

Dynamic Sharing in Multi-accelerators of Neural Networks on an FPGA Edge Device FPGA边缘器件上神经网络多加速器的动态共享

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) Pub Date : 2020-07-01 DOI: 10.1109/ASAP49362.2020.00040

Hsin-Yu Ting, Tootiya Giyahchi, A. A. Sani, E. Bozorgzadeh

引用次数: 13