2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)最新文献

筛选
英文 中文
ASAP 2020 TOC
{"title":"ASAP 2020 TOC","authors":"","doi":"10.1109/asap49362.2020.00004","DOIUrl":"https://doi.org/10.1109/asap49362.2020.00004","url":null,"abstract":"","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125400548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort 数据中心中的fpga:并行混合超标量字符串样本排序的情况
Mikhail Asiatici, Damian Maiorano, P. Ienne
{"title":"FPGAs in the Datacenters: the Case of Parallel Hybrid Super Scalar String Sample Sort","authors":"Mikhail Asiatici, Damian Maiorano, P. Ienne","doi":"10.1109/ASAP49362.2020.00031","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00031","url":null,"abstract":"String sorting is an important part of database and MapReduce applications; however, it has not been studied as extensively as sorting of fixed-length keys. Handling variable-length keys in hardware is challenging and it is no surprise that no string sorters on FPGA have been proposed yet. In this paper, we present Parallel Hybrid Super Scalar String Sample Sort (pHS5) on Intel HARPv2, a heterogeneous CPU-FPGA system with a server-grade multi-core CPU. Our pHS5 is based on the state-of-the-art string sorting algorithm for multi-core shared memory CPUs, pS5, which we extended with multiple processing elements (PEs) on the FPGA. Each PE accelerates one instance of the most effectively parallelizable dominant kernel of pS5 by up to 33% compared to a single Intel Xeon Broadwell core running at 3.4 GHz. Furthermore, we extended the job scheduling mechanism of pS5 to enable our PEs to compete with the CPU cores for processing the accelerable kernel, while retaining the complex high-level control flow and the sorting of the smaller data sets on the CPU. We accelerate the whole algorithm by up to 10% compared to the 28 thread software baseline running on the 14-core Xeon processor and by up to 36% at lower thread counts.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115188810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Persistent Fault Analysis of Neural Networks on FPGA-based Acceleration System 基于fpga加速系统的神经网络持续故障分析
Dawen Xu, Ziyang Zhu, Cheng Liu, Ying Wang, Huawei Li, Lei Zhang, K. Cheng
{"title":"Persistent Fault Analysis of Neural Networks on FPGA-based Acceleration System","authors":"Dawen Xu, Ziyang Zhu, Cheng Liu, Ying Wang, Huawei Li, Lei Zhang, K. Cheng","doi":"10.1109/ASAP49362.2020.00024","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00024","url":null,"abstract":"The increasing hardware failures caused by the shrinking semiconductor technologies pose substantial influence on the neural accelerators and improving the resilience of the neural network execution becomes a great design challenge especially to mission-critical applications such as self-driving and medical diagnose. The reliability analysis of the neural network execution is a key step to understand the influence of the hardware failures, and thus is highly demanded. Prior works typically conducted the fault analysis of neural network accelerators with simulation and concentrated on the prediction accuracy loss of the models. There is still a lack of systematic fault analysis of the neural network acceleration system that considers both the accuracy degradation and system exceptions such as system stall and early termination.In this work, we implemented a representative neural network accelerator and fault injection modules on a Xilinx ARM-FPGA platform and conducted fault analysis of the system using four typical neural network models. We had the system open-sourced on github. With comprehensive experiments, we identify the system exceptions based on the various abnormal behaviours of the FPGA-based neural network acceleration system and analyze the underlying reasons. Particularly, we find that the probability of the system exceptions dominates the reliability of the system and they are mainly caused by faults in the DMA, control unit and instruction memory of the accelerators. In addition, faults in these components also incur moderate accuracy degradation of the neural network models other than the system exceptions. Thus, these components are the most fragile part of the accelerators and need to be hardened for reliable neural network execution.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114629896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Anytime Floating-Point Addition and Multiplication-Concepts and Implementations 任意时间浮点加法和乘法——概念和实现
Marcel Brand, Michael Witterauf, A. Bosio, J. Teich
{"title":"Anytime Floating-Point Addition and Multiplication-Concepts and Implementations","authors":"Marcel Brand, Michael Witterauf, A. Bosio, J. Teich","doi":"10.1109/ASAP49362.2020.00034","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00034","url":null,"abstract":"In this paper, we present anytime instructions for floating-point additions and multiplications. Specific to such instructions is their ability to compute an arithmetic operation at a programmable accuracy of a most significant bits where a is encoded in the instruction itself. Contrary to reduced-precision architectures, the word length is maintained throughout the execution. Two approaches are presented for the efficient implementation of anytime additions and multiplications, one based on on-line arithmetic and the other on bitmasking. We propose implementations of anytime functional units for both approaches and evaluate them in terms of error, latency, area, as well as energy savings. As a result, 15% of energy can be saved on average while computing a floating-point addition with an error of less than 0.1%. Moreover, large latency and energy savings are reported for iterative algorithms such as a Jacobi algorithm with savings of up to 39% in energy.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114691672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic 可变精度位置算法的可重构流张量单元
Nuno Neves, P. Tomás, N. Roma
{"title":"Reconfigurable Stream-based Tensor Unit with Variable-Precision Posit Arithmetic","authors":"Nuno Neves, P. Tomás, N. Roma","doi":"10.1109/ASAP49362.2020.00033","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00033","url":null,"abstract":"The increased adoption of DNN applications drove the emergence of dedicated tensor computing units to accelerate multi-dimensional matrix multiplication operations. Although they deploy highly efficient computing architectures, they often lack support for more general-purpose application domains. Such a limitation occurs both due to their consolidated computation scheme (restricted to matrix multiplication) and due to their frequent adoption of low-precision/custom floating-point formats (unsuited for general application domains). In contrast, this paper proposes a new Reconfigurable Tensor Unit (RTU) which deploys an array of variable-precision Vector MultiplyAccumulate (VMA) units. Furthermore, each VMA unit leverages the new Posit floating-point format and supports the full range of standardized posit precisions in a single SIMD unit, with variable vector-element width. Moreover, the proposed RTU explores the Posit format features for fused operations, together with spatial and time-multiplexing reconfiguration mechanisms to fuse and combine multiple VMAs to map high-level and complex operations. The RTU is also supported by an automatic data streaming infrastructure and a pipelined data movement scheme, allowing it to accelerate the computation of most data-parallel patterns commonly present in vectorizable applications. The proposed RTU showed to outperform state-of-the-art tensor and SIMD units, present in off-the-shelf platforms, in turn resulting in significant energy-efficiency improvements.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126531987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators 阵列感知训练/剪枝:基于阵列的神经网络加速器有效前向传播方法
Krishna Teja Chitty-Venkata, Arun Kumar Somani
{"title":"Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators","authors":"Krishna Teja Chitty-Venkata, Arun Kumar Somani","doi":"10.1109/ASAP49362.2020.00016","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00016","url":null,"abstract":"Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131713230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Training Neural Nets using only an Approximate Tableless LNS ALU 仅使用近似无表LNS ALU训练神经网络
M. Arnold, E. Chester, Corey Johnson
{"title":"Training Neural Nets using only an Approximate Tableless LNS ALU","authors":"M. Arnold, E. Chester, Corey Johnson","doi":"10.1109/ASAP49362.2020.00020","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00020","url":null,"abstract":"The Logarithmic Number System (LNS) is useful in applications that tolerate approximate computation, such as classification using multi-layer neural networks that compute nonlinear functions of weighted sums of inputs from previous layers. Supervised learning has two phases: training (find appropriate weights for the desired classification), and inference (use the weights with approximate sum of products). Several researchers have observed that LNS ALUs in inference may minimize area and power by being both low-precision and approximate (allowing low-cost, tableless implementations). However, the few works that have also trained with LNS report at least part of the system needs accurate LNS. This paper describes a novel approximate LNS ALU implemented simply as logic (without tables) that enables the entire back-propagation training to occur in LNS, at one-third the cost of fixed-point implementation.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131128834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
[ASAP 2020 Title page] [ASAP 2020首页]
{"title":"[ASAP 2020 Title page]","authors":"","doi":"10.1109/asap49362.2020.00002","DOIUrl":"https://doi.org/10.1109/asap49362.2020.00002","url":null,"abstract":"","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124676316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
External Referees - ASAP 2020 外部裁判-尽快2020
{"title":"External Referees - ASAP 2020","authors":"","doi":"10.1109/asap49362.2020.00009","DOIUrl":"https://doi.org/10.1109/asap49362.2020.00009","url":null,"abstract":"","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123165919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Sharing in Multi-accelerators of Neural Networks on an FPGA Edge Device FPGA边缘器件上神经网络多加速器的动态共享
Hsin-Yu Ting, Tootiya Giyahchi, A. A. Sani, E. Bozorgzadeh
{"title":"Dynamic Sharing in Multi-accelerators of Neural Networks on an FPGA Edge Device","authors":"Hsin-Yu Ting, Tootiya Giyahchi, A. A. Sani, E. Bozorgzadeh","doi":"10.1109/ASAP49362.2020.00040","DOIUrl":"https://doi.org/10.1109/ASAP49362.2020.00040","url":null,"abstract":"Edge computing can potentially provide abundant processing resources for compute-intensive applications while bringing services close to end devices. With the increasing demands for computing acceleration at the edge, FPGAs have been deployed to provide custom deep neural network accelerators. This paper explores a DNN accelerator sharing system at the edge FPGA device, that serves various DNN applications from multiple end devices simultaneously. The proposed SharedDNN/PlanAhead policy exploits the regularity among requests for various DNN accelerators and determines which accelerator to allocate for each request and in what order to respond to the requests that achieve maximum responsiveness for a queue of acceleration requests. Our results show overall 2. 20x performance gain at best and utilization improvement by reducing up to 27% of DNN library usage while staying within the requests’ requirements and resource constraints.","PeriodicalId":375691,"journal":{"name":"2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124278105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信