2020 IEEE 33rd International System-on-Chip Conference (SOCC)最新文献_第5页

Optimizing CNN Accelerator With Improved Roofline Model 改进的屋顶线模型优化CNN加速器

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524754

Shaoxia Fang, Shulin Zeng, Yu Wang

{"title":"Optimizing CNN Accelerator With Improved Roofline Model","authors":"Shaoxia Fang, Shulin Zeng, Yu Wang","doi":"10.1109/socc49529.2020.9524754","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524754","url":null,"abstract":"The external memory I/O bandwidth is the most common performance bottleneck for Convolutional Neural Network(CNN) inference accelerators. On the other hand, performance is also affected by many other factors such as the on-chip memory size and data scheduling strategies, making it difficult to identify the root cause of performance degradation. This paper proposes an improved roofline model specifically for the CNN accelerator, which provides a deep understanding of the bandwidth bottlenecks and points out the direction of optimization. Previous roofline models have focused on modeling and optimizing each layer, while neglecting some high-level optimizations (e.g. layer fusion and batch processing) that alleviate the bandwidth requirements. However, the uneven cross-layer bandwidth requirements can have a significant impact on the overall performance, and the combination of independently optimized layers does not necessarily result in an overall optimal solution. Our model is capable of modeling more complex data scheduling strategies and enables a larger design space than previous roofline models. We use the Xilinx CNN accelerator on ZU9 FPGA as an example for quantitative analysis and optimization. We apply the optimization method derived from the improved roofline model to the original design and ultimately achieve a 1.6x performance improvement. The derived optimization method effectively solves the severe temporary bandwidth overload problem in the original design that leads to the computational inefficiency.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Welcome Message from the TPC Chairs TPC主席的欢迎辞

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524777

引用次数: 0

A High-Speed Architecture for the Reduction in VDF Based on a Class Group 一种基于类群的VDF高速约简体系结构

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524783

Yifeng Song, Danyang Zhu, Jing Tian, Zhongfeng Wang

{"title":"A High-Speed Architecture for the Reduction in VDF Based on a Class Group","authors":"Yifeng Song, Danyang Zhu, Jing Tian, Zhongfeng Wang","doi":"10.1109/socc49529.2020.9524783","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524783","url":null,"abstract":"Due to the enormous energy consuming involved in the proof of work (POW) process, the resource-efficient blockchain system is urged to be released. The verifiable delay function (VDF), being slow to compute and easy to verify, is believed to be the kernel function of the next-generation blockchain system. In general, the reduction over a class group, involving many complex operations, such as the large-number division and multiplication operations, takes a large portion in the VDF. In this paper, for the first time, we propose a highspeed architecture for the reduction by incorporating algorithmic transformations and architectural optimizations. Firstly, based on the fastest reduction algorithm, we present a modified version to make it more hardware-friendly by introducing a novel transformation method that can efficiently remove the large-number divisions. Secondly, highly parallelized and pipelined architectures are devised respectively for the large-number multiplication and addition operations to reduce the latency and the critical path. Thirdly, a compact state machine is developed to enable maximum overlapping in time for computations. The experiment results show that when computing 209715 reduction steps with the input width of 2048 bits, the proposed design only takes 137.652ms running on an Altera Stratix-10 FPGA at 100MHz frequency, while the original algorithm needs 3278ms when operating over an i7-6850K CPU at 3.6GHz frequency. Thus we have obtained a drastic speedup of nearly 24x over an advanced CPU.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134424076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Ferroelectric FET Based In-memory Architecture for Multi-Precision Neural Networks 一种基于铁电场效应晶体管的多精度神经网络内存结构

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524750

T. Soliman, R. Olivo, T. Kirchner, M. Lederer, T. Kämpfe, A. Guntoro, N. Wehn

引用次数: 8

Program for 2020 IEEE 33rd International System-on-Chip Conference (SOCC) 2020年IEEE第33届国际系统芯片会议(soc)议程

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524803

引用次数: 0

Deep Reinforcement Learning for Self-Configurable NoC 自配置NoC的深度强化学习

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524761

Md Farhadur Reza

{"title":"Deep Reinforcement Learning for Self-Configurable NoC","authors":"Md Farhadur Reza","doi":"10.1109/socc49529.2020.9524761","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524761","url":null,"abstract":"Network-on-Chips (NoCs) has been the superior interconnect fabric for multi/many-core on-chip systems because of its scalability and parallelism. On-chip network resources can be dynamically configured to improve the energy-efficiency and performance of NoC. However, large and complex design space in heterogeneous NoC architectures becomes difficult to explore within a reasonable time for optimal trade-offs of energy and performance. Furthermore, reactive resource management is not effective in preventing problems, such as creating thermal hotspots and exceeding chip power budget, from happening in adaptive systems. Therefore, we propose machine learning (ML) technique to provide proactive solution within an instant for both energy and performance efficiency. In this paper, we present deep reinforcement learning (deep RL) techniques to configure the voltage/frequency levels of both NoC routers and links in multicore architectures for energy-efficiency while providing high-performance NoC. We propose the use of reinforcement learning (RL) to configure the NoC resources intelligently based on system utilization and application demands. Additionally, neural networks (NNs) are used to approximate the actions of distributed RL agents in large-scale systems, to mitigate the large cost of traditional table-based RL. Simulations results for 256-core and 16-core NoC architectures under real-world benchmarks show that the proposed approach improves energy-delay product significantly (40%) when compared to traditional non-ML based solution. Furthermore, the proposed solution incurs very low energy and hardware overhead while providing self-configurable NoC to meet the real-time requirements of applications.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121933892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

[Keynote Speaker - 6 abstracts] [主讲人- 6个摘要]

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524769

引用次数: 0

A Reconfigurable Permutation Based Address Encryption Architecture for Memory Security 一种基于可重构排列的内存安全地址加密体系结构

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524762

Yuchen Mei, Li Du, Xuewen He, Yuan Du, Xiaoliang Chen, Zhongfeng Wang

引用次数: 0

A Configurable FPGA Accelerator of Bi-LSTM Inference with Structured Sparsity 结构化稀疏Bi-LSTM推理的可配置FPGA加速器

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524784

Shouliang Guo, Chao Fang, Jun Lin, Zhongfeng Wang

引用次数: 2

Secure Your SoC: Building System-an-Chip Designs for Security 保护您的SoC:构建安全的系统芯片设计

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524760

S. Bhasin, Trevor E. Carlson, A. Chattopadhyay, Vinay B. Y. Kumar, A. Mendelson, R. Poussier, Yaswanth Tavva

{"title":"Secure Your SoC: Building System-an-Chip Designs for Security","authors":"S. Bhasin, Trevor E. Carlson, A. Chattopadhyay, Vinay B. Y. Kumar, A. Mendelson, R. Poussier, Yaswanth Tavva","doi":"10.1109/socc49529.2020.9524760","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524760","url":null,"abstract":"Modern System-on-Chip designs (SoCs) are becoming increasingly complex and powerful, catering to a wide range of application domains. Their use in security-critical tasks calls for a holistic approach to SoC design, including security as a first-class architecture constraint, rather than adding security only as an afterthought. The problem is compounded by the inclusion of multiple, potentially untrusted, third party components in the SoC design. To address this challenge systematically, this paper explores four distinct and important aspects of designing secure SoCs. First, starting at the component level, an evaluation framework for assessing component security against physical attacks is proposed. Second, a scalable simulation framework is developed to integrate these secure components which offers flexibility for early- and late-stage SoC development. Third, dynamic and static techniques are proposed to determine when the system is under attack, with a key focus on Hardware Trojans as threat. Finally, a design strategy for integrating untrusted components into a SoC through hardware Root-of-Trust is outlined. For each of these aspects we present early-stage evaluations, and show how these complement each other towards the design of a secure SoC.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125910662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0