2021 IEEE 39th International Conference on Computer Design (ICCD)最新文献_第9页

PRL: Standardizing Performance Monitoring Library for High-Integrity Real-Time Systems 面向高完整性实时系统的标准化性能监控库

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00061

J. Giesen, E. Mezzetti, J. Abella, F. Cazorla

{"title":"PRL: Standardizing Performance Monitoring Library for High-Integrity Real-Time Systems","authors":"J. Giesen, E. Mezzetti, J. Abella, F. Cazorla","doi":"10.1109/ICCD53106.2021.00061","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00061","url":null,"abstract":"The use of complex processors is becoming ubiquitous in High-Integrity Systems (HIS). To deal with processor’s increased complexity, Performance Monitoring Counters (PMCs) are increasingly used to reason on software behavior and provide the necessary evidence to support software certification. However, the use of PMCs in HIS is relatively recent and hence far from being standardized. As a result, software engineers are forced to resort to highly-customized, low-level programming of platform-specific PMC control registers, which is both error prone and time consuming. To cover this gap, we propose building on the PAPI library, a standardized performance monitoring solution in the mainstream domain, and develop a PMC Reading Library (PRL) for configuring and collecting traceable events while capturing HIS specific requirements and peculiarities. We instantiate PRL in a reference automotive configuration to show that PRL meets key HIS requirements: negligible footprint, limited and predictable overhead, and accuracy collecting hardware events by filtering out the impact of interrupts and context switches.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124080387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Copyright 版权

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/iccd53106.2021.00003

引用次数: 0

HosNa: A DPC++ Benchmark Suite for Heterogeneous Architectures 面向异构架构的dpc++基准测试套件

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00084

Najmeh Nazari Bavarsad, Hosein Mohammadi Makrani, H. Sayadi, Lawrence Landis, S. Rafatirad, H. Homayoun

{"title":"HosNa: A DPC++ Benchmark Suite for Heterogeneous Architectures","authors":"Najmeh Nazari Bavarsad, Hosein Mohammadi Makrani, H. Sayadi, Lawrence Landis, S. Rafatirad, H. Homayoun","doi":"10.1109/ICCD53106.2021.00084","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00084","url":null,"abstract":"Most data centers equipped their general-purpose processors with hardware accelerators to reduce power consumption and improve utilization. Hardware accelerators offer highly energy-efficient computation for a wide range of applications; however, their programming is not as efficient as processors. To bridge the gap, Intel developed a cloud-based infrastructure called DevCloud that connects Intel® Xeon® Scalable Processors to GPUs and FPGAs to deliver high compute performance for emerging workloads. DevCloud assists developers with their compute-intensive tasks and provides access to precompiled software optimized for Intel® architecture. To reduce programming complexity and minimize the barriers to adopt new innovative hardware technology, Intel also provided a unified, cross-architecture programming model called oneAPI based on the Data-Parallel C++ (DPC++) language. In this paper, we introduce HosNa, the first DPC++ benchmark suite that can be used for the evaluation of the Intel FPGAs and DPC++ productivity. Moreover, we present the characterization of proposed benchmarks and the evaluation of implemented hardware accelerators in terms of speedup and latency.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125290573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Efficient Table-Based Polynomial on FPGA 基于FPGA的高效表多项式

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00066

Marco Barbone, B. W. Kwaadgras, U. Oelfke, W. Luk, G. Gaydadjiev

{"title":"Efficient Table-Based Polynomial on FPGA","authors":"Marco Barbone, B. W. Kwaadgras, U. Oelfke, W. Luk, G. Gaydadjiev","doi":"10.1109/ICCD53106.2021.00066","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00066","url":null,"abstract":"Field Programmable Gate Arrays (FPGAs) are gaining popularity in the context of scientific computing due to the recent advances of High-Level Synthesis (HLS) toolchains for customised hardware implementations combined with the increase in computing capabilities of modern FPGAs. As a result, developers are able to implement more complex scientific workloads which often require the evaluation of univariate numerical functions. In this study, we propose a methodology for table-based polynomial interpolation aiming at producing area-efficient implementations of such functions on FPGAs achieving the same accuracy and at similar performance as direct implementations. We also provide a rigorous error analysis to guarantee the correctness of the results. Our methodology covers the forecast of resource utilisation of the polynomial interpolator and, based on the characteristics of the function, guides the developer to the most area-efficient FPGA implementation. Our experiments show that in the case of a radiation spectrum of a Black Body application based on evaluating Planck’s Law, it is possible to reduce resource utilisation by up to 90% when compared to direct implementations not using table-based methods. Moreover, when only the kernels are considered, our method uses up to two orders of magnitude fewer resources with no performance penalties. Based on previous more theoretical works, our study investigates practical applications of table-based methods in the context of high performance and scientific computing where it is used to implement common but more complex functions than the elementary functions widely studied in the related literature.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"1917 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128008130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

AdaptBit-HD: Adaptive Model Bitwidth for Hyperdimensional Computing AdaptBit-HD:用于超维计算的自适应模型位宽

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-10-01 DOI: 10.1109/ICCD53106.2021.00026

Justin Morris, Si Thu Kaung Set, Gadi Rosen, M. Imani, Baris Aksanli, T. Simunic

{"title":"AdaptBit-HD: Adaptive Model Bitwidth for Hyperdimensional Computing","authors":"Justin Morris, Si Thu Kaung Set, Gadi Rosen, M. Imani, Baris Aksanli, T. Simunic","doi":"10.1109/ICCD53106.2021.00026","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00026","url":null,"abstract":"Brain-inspired Hyperdimensional (HD) computing is a novel computing paradigm emulating the neuron’s activity in high-dimensional space. The first step in HD computing is to map each data point into high-dimensional space (e.g., 10,000). This poses several problems. For instance, the size of the data can explode and all subsequent operations need to be performed in parallel in D = 10,000 dimensions. Prior work alleviated this issue with model quantization. The HVs could then be stored in less space than the original data and lower bitwidth operations can be used to save energy. However, prior work quantized all samples to the same bitwidth. We propose, AdaptBit-HD, an Adaptive Model Bitwidth Architecture for accelerating HD Computing. AdaptBit-HD operates on the bits of the quantized model one bit at a time to save energy when fewer bits can be used to find the correct class. With AdaptBit-HD, we can achieve both high accuracy by utilizing all the bits when necessary and high energy efficiency by terminating execution at lower bits when our design is confident in the output. We additionally design an endto-end FPGA accelerator for AdaptBit-HD. Compared to 16-bit models, AdaptBit-HD is 14× more energy efficient and compared to binary models, AdaptBit-HD is 1.1% more accurate, which is comparable in accuracy to 16-bit models. This demonstrates that AdaptBit-HD is able to achieve the accuracy of full precision models, with the energy efficiency of binary models.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115229078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

The Accuracy and Efficiency of Posit Arithmetic 正数算法的精度和效率

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-09-16 DOI: 10.1109/ICCD53106.2021.00024

Ștefan-Dan Ciocîrlan, Dumitrel Loghin, Lavanya Ramapantulu, N. Tapus, Y. M. Teo

{"title":"The Accuracy and Efficiency of Posit Arithmetic","authors":"Ștefan-Dan Ciocîrlan, Dumitrel Loghin, Lavanya Ramapantulu, N. Tapus, Y. M. Teo","doi":"10.1109/ICCD53106.2021.00024","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00024","url":null,"abstract":"Motivated by the increasing interest in the posit numeric format, in this paper we evaluate the accuracy and efficiency of posit arithmetic in contrast to the traditional IEEE 754 32-bit floating-point (FP32) arithmetic. We first design and implement a Posit Arithmetic Unit (PAU), called POSAR, with flexible bit-sized arithmetic suitable for applications that can trade accuracy for savings in chip area. Next, we analyze the accuracy and efficiency of POSAR with a series of benchmarks including mathematical computations, ML kernels, NAS Parallel Benchmarks (NPB), and Cifar-10 CNN. This analysis is done on our implementation of POSAR integrated into a RISC-V Rocket Chip core in comparison with the IEEE 754-based Floting Point Unit (FPU) of Rocket Chip. Our analysis shows that POSAR can outperform the FPU, but the results are not spectacular. For NPB, 32-bit posit achieves better accuracy than FP32 and improves the execution by up to 2%. However, POSAR with 32-bit posit needs 30% more FPGA resources compared to the FPU. For classic ML algorithms, we find that 8-bit posits are not suitable to replace FP32 because they exhibit low accuracy leading to wrong results. Instead, 16-bit posit offers the best option in terms of accuracy and efficiency. For example, 16-bit posit achieves the same Top-1 accuracy as FP32 on a Cifar-10 CNN with a speedup of 18%.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131899710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

QFlow: Quantitative Information Flow for Security-Aware Hardware Design in Verilog 基于Verilog的安全感知硬件设计的定量信息流

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-09-06 DOI: 10.1109/ICCD53106.2021.00097

Lennart M. Reimann, Luca Hanel, Dominik Sisejkovic, Farhad Merchant, R. Leupers

{"title":"QFlow: Quantitative Information Flow for Security-Aware Hardware Design in Verilog","authors":"Lennart M. Reimann, Luca Hanel, Dominik Sisejkovic, Farhad Merchant, R. Leupers","doi":"10.1109/ICCD53106.2021.00097","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00097","url":null,"abstract":"The enormous amount of code required to design modern hardware implementations often leads to critical vulnerabilities being overlooked. Especially vulnerabilities that compromise the confidentiality of sensitive data, such as cryptographic keys, have a major impact on the trustworthiness of an entire system. Information flow analysis can elaborate whether information from sensitive signals flows towards outputs or untrusted components of the system. But most of these analytical strategies rely on the non-interference property, stating that the untrusted targets must not be influenced by the source’s data, which is shown to be too inflexible for many applications. To address this issue, there are approaches to quantify the information flow between components such that insignificant leakage can be neglected. Due to the high computational complexity of this quantification, approximations are needed, which introduce mispredictions. To tackle those limitations, we reformulate the approximations. Further, we propose a tool QFlow with a higher detection rate than previous tools. It can be used by non-experienced users to identify data leakages in hardware designs, thus facilitating a security-aware design process.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131470062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Consistent RDMA-Friendly Hashing on Remote Persistent Memory 远程持久内存上一致的rdma友好哈希

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-07-14 DOI: 10.1109/ICCD53106.2021.00037

Xinxin Liu, Yu Hua, Rong Bai

{"title":"Consistent RDMA-Friendly Hashing on Remote Persistent Memory","authors":"Xinxin Liu, Yu Hua, Rong Bai","doi":"10.1109/ICCD53106.2021.00037","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00037","url":null,"abstract":"Coalescing RDMA and Persistent Memory (PM) delivers high end-to-end performance for networked storage systems, which requires rethinking the design of efficient hash structures. In general, existing hashing schemes separately opti-mize RDMA and PM, thus partially addressing the problems of RDMA Access Amplification and High-Overhead PM Consistency. In order to address these problems, we propose a continuity hashing, which is a \"one-stone-two-birds\" design to optimize both RDMA and PM. The continuity hashing leverages a fine-grained contiguous shared region, called SBuckets, to provide standby positions for the neighbouring two buckets in case of hash collisions. In the continuity hashing, remote read only needs a single RDMA read to directly fetch the home bucket and the neighbouring SBuckets, which contain all the positions of maintaining a key-value item, thus alleviating RDMA access amplification. Continuity hashing further leverages indicators that can be atomically modified to support log-free PM consistency for all the write operations. Evaluation results demonstrate that compared with state-of-the-art techniques, continuity hashing achieves high throughput, low latency and the smallest number of PM writes with acceptable load factors.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125220701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network 基于rram的稀疏乘法引擎压缩神经网络的位稀疏性

2021 IEEE 39th International Conference on Computer Design (ICCD) Pub Date : 2021-03-02 DOI: 10.1109/ICCD53106.2021.00072

Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang

{"title":"SME: ReRAM-based Sparse-Multiplication-Engine to Squeeze-Out Bit Sparsity of Neural Network","authors":"Fangxin Liu, Wenbo Zhao, Yilong Zhao, Zongwu Wang, Tao Yang, Zhezhi He, Naifeng Jing, Xiaoyao Liang, Li Jiang","doi":"10.1109/ICCD53106.2021.00072","DOIUrl":"https://doi.org/10.1109/ICCD53106.2021.00072","url":null,"abstract":"Resistive Random-Access-Memory (ReRAM) cross-bar is a promising technique for deep neural network (DNN) accelerators, thanks to its in-memory and in-situ analog computing abilities for Vector-Matrix Multiplication-and-Accumulations (VMMs). However, it is challenging for crossbar architecture to exploit the sparsity in DNNs. It inevitably causes complex and costly control to exploit fine-grained sparsity due to the limitation of tightly-coupled crossbar structure.As the countermeasure, we develop a novel ReRAM-based DNN accelerator, named Sparse-Multiplication-Engine (SME), based on a hardware and software co-design framework. First, we orchestrate the bit-sparse pattern to increase the density of bit-sparsity based on existing quantization methods. Second, we propose a novel weight mapping mechanism to slice the bits of a weight across the crossbars and splice the activation results in peripheral circuits. This mechanism can decouple the tightly-coupled crossbar structure and cumulate the sparsity in the crossbar. Finally, a superior squeeze-out scheme empties the crossbars mapped with highly-sparse non-zeros from the previous two steps. We design the SME architecture and discuss its use for other quantization methods and different ReRAM cell technologies. Compared with prior state-of-the-art designs, the SME shrinks the use of crossbars up to 8.7× and 2.1× using ResNet-50 and MobileNet-v2, respectively, with ≤ 0.3% accuracy drop on ImageNet.","PeriodicalId":154014,"journal":{"name":"2021 IEEE 39th International Conference on Computer Design (ICCD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130538875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10