2020 IEEE 33rd International System-on-Chip Conference (SOCC)最新文献

筛选
英文 中文
Optimizing CNN Accelerator With Improved Roofline Model 改进的屋顶线模型优化CNN加速器
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524754
Shaoxia Fang, Shulin Zeng, Yu Wang
{"title":"Optimizing CNN Accelerator With Improved Roofline Model","authors":"Shaoxia Fang, Shulin Zeng, Yu Wang","doi":"10.1109/socc49529.2020.9524754","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524754","url":null,"abstract":"The external memory I/O bandwidth is the most common performance bottleneck for Convolutional Neural Network(CNN) inference accelerators. On the other hand, performance is also affected by many other factors such as the on-chip memory size and data scheduling strategies, making it difficult to identify the root cause of performance degradation. This paper proposes an improved roofline model specifically for the CNN accelerator, which provides a deep understanding of the bandwidth bottlenecks and points out the direction of optimization. Previous roofline models have focused on modeling and optimizing each layer, while neglecting some high-level optimizations (e.g. layer fusion and batch processing) that alleviate the bandwidth requirements. However, the uneven cross-layer bandwidth requirements can have a significant impact on the overall performance, and the combination of independently optimized layers does not necessarily result in an overall optimal solution. Our model is capable of modeling more complex data scheduling strategies and enables a larger design space than previous roofline models. We use the Xilinx CNN accelerator on ZU9 FPGA as an example for quantitative analysis and optimization. We apply the optimization method derived from the improved roofline model to the original design and ultimately achieve a 1.6x performance improvement. The derived optimization method effectively solves the severe temporary bandwidth overload problem in the original design that leads to the computational inefficiency.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127386595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Welcome Message from the TPC Chairs TPC主席的欢迎辞
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524777
{"title":"Welcome Message from the TPC Chairs","authors":"","doi":"10.1109/socc49529.2020.9524777","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524777","url":null,"abstract":"","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122009340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A High-Speed Architecture for the Reduction in VDF Based on a Class Group 一种基于类群的VDF高速约简体系结构
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524783
Yifeng Song, Danyang Zhu, Jing Tian, Zhongfeng Wang
{"title":"A High-Speed Architecture for the Reduction in VDF Based on a Class Group","authors":"Yifeng Song, Danyang Zhu, Jing Tian, Zhongfeng Wang","doi":"10.1109/socc49529.2020.9524783","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524783","url":null,"abstract":"Due to the enormous energy consuming involved in the proof of work (POW) process, the resource-efficient blockchain system is urged to be released. The verifiable delay function (VDF), being slow to compute and easy to verify, is believed to be the kernel function of the next-generation blockchain system. In general, the reduction over a class group, involving many complex operations, such as the large-number division and multiplication operations, takes a large portion in the VDF. In this paper, for the first time, we propose a highspeed architecture for the reduction by incorporating algorithmic transformations and architectural optimizations. Firstly, based on the fastest reduction algorithm, we present a modified version to make it more hardware-friendly by introducing a novel transformation method that can efficiently remove the large-number divisions. Secondly, highly parallelized and pipelined architectures are devised respectively for the large-number multiplication and addition operations to reduce the latency and the critical path. Thirdly, a compact state machine is developed to enable maximum overlapping in time for computations. The experiment results show that when computing 209715 reduction steps with the input width of 2048 bits, the proposed design only takes 137.652ms running on an Altera Stratix-10 FPGA at 100MHz frequency, while the original algorithm needs 3278ms when operating over an i7-6850K CPU at 3.6GHz frequency. Thus we have obtained a drastic speedup of nearly 24x over an advanced CPU.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134424076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Ferroelectric FET Based In-memory Architecture for Multi-Precision Neural Networks 一种基于铁电场效应晶体管的多精度神经网络内存结构
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524750
T. Soliman, R. Olivo, T. Kirchner, M. Lederer, T. Kämpfe, A. Guntoro, N. Wehn
{"title":"A Ferroelectric FET Based In-memory Architecture for Multi-Precision Neural Networks","authors":"T. Soliman, R. Olivo, T. Kirchner, M. Lederer, T. Kämpfe, A. Guntoro, N. Wehn","doi":"10.1109/socc49529.2020.9524750","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524750","url":null,"abstract":"Computing-in-memory (CIM) is a promising approach to improve the throughput and the energy efficiency of deep neural network (DNN) processors. So far, resistive nonvolatile memories have been adapted to build crossbar-based accelerators for DNN inference. However, such structures suffer from several drawbacks such as sneak paths, large ADCs/DACs, high write energy, etc. In this paper we present a mixed signal in-memory hardware accelerator for CNNs. We propose an in-memory inference system that uses FeFETs as the main nonvolatile memory cell. We show how the proposed crossbar unit cell can overcome the aforementioned issues while reducing unit cell size and power consumption. The proposed system decomposes multi-bit operands down to single bit operations. We then re-combine them without any loss of precision using accumulators and shifters within the crossbar and across different crossbars. Simulations demonstrate that we can outperform state-of-the-art efficiencies with 3.28 TOPS/W and can pack 1.64 TOPS in an area of 1.52mm2using 22 nm FDSOI technology,","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132942196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Program for 2020 IEEE 33rd International System-on-Chip Conference (SOCC) 2020年IEEE第33届国际系统芯片会议(soc)议程
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524803
{"title":"Program for 2020 IEEE 33rd International System-on-Chip Conference (SOCC)","authors":"","doi":"10.1109/socc49529.2020.9524803","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524803","url":null,"abstract":"","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114328687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Reinforcement Learning for Self-Configurable NoC 自配置NoC的深度强化学习
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524761
Md Farhadur Reza
{"title":"Deep Reinforcement Learning for Self-Configurable NoC","authors":"Md Farhadur Reza","doi":"10.1109/socc49529.2020.9524761","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524761","url":null,"abstract":"Network-on-Chips (NoCs) has been the superior interconnect fabric for multi/many-core on-chip systems because of its scalability and parallelism. On-chip network resources can be dynamically configured to improve the energy-efficiency and performance of NoC. However, large and complex design space in heterogeneous NoC architectures becomes difficult to explore within a reasonable time for optimal trade-offs of energy and performance. Furthermore, reactive resource management is not effective in preventing problems, such as creating thermal hotspots and exceeding chip power budget, from happening in adaptive systems. Therefore, we propose machine learning (ML) technique to provide proactive solution within an instant for both energy and performance efficiency. In this paper, we present deep reinforcement learning (deep RL) techniques to configure the voltage/frequency levels of both NoC routers and links in multicore architectures for energy-efficiency while providing high-performance NoC. We propose the use of reinforcement learning (RL) to configure the NoC resources intelligently based on system utilization and application demands. Additionally, neural networks (NNs) are used to approximate the actions of distributed RL agents in large-scale systems, to mitigate the large cost of traditional table-based RL. Simulations results for 256-core and 16-core NoC architectures under real-world benchmarks show that the proposed approach improves energy-delay product significantly (40%) when compared to traditional non-ML based solution. Furthermore, the proposed solution incurs very low energy and hardware overhead while providing self-configurable NoC to meet the real-time requirements of applications.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121933892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
[Keynote Speaker - 6 abstracts] [主讲人- 6个摘要]
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524769
{"title":"[Keynote Speaker - 6 abstracts]","authors":"","doi":"10.1109/socc49529.2020.9524769","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524769","url":null,"abstract":"","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115300480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Reconfigurable Permutation Based Address Encryption Architecture for Memory Security 一种基于可重构排列的内存安全地址加密体系结构
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524762
Yuchen Mei, Li Du, Xuewen He, Yuan Du, Xiaoliang Chen, Zhongfeng Wang
{"title":"A Reconfigurable Permutation Based Address Encryption Architecture for Memory Security","authors":"Yuchen Mei, Li Du, Xuewen He, Yuan Du, Xiaoliang Chen, Zhongfeng Wang","doi":"10.1109/socc49529.2020.9524762","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524762","url":null,"abstract":"Most of the existing memory encryption techniques in IoT devices are based on data encryption. The level of security increases at the cost of the increased encryption algorithm complexity, resulting in large power consumption and area overhead for high-security devices. In this paper, we take a significantly different approach to encrypt the device memory through address encryption. A reconfigurable architecture called Permutation based Address Encryption (PAE) is proposed, for the first time, to encrypt the device memory with minor hardware overhead and much shorter processing time. The architecture is synthesized in SMIC 40nm standard CMOS technology. Compared with Data Encryption Standard (DES), the proposed PAE achieves 16x encryption speed and 1.4x effective key length. When combined with the DES, the PAE+DES encryption outperforms existing hardware Advanced Encryption Standard (AES) with almost 2x in power efficiency, more than 1.5x in area efficiency and better security, making it a promising hardware encryption technique for IoT devices.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114781768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Configurable FPGA Accelerator of Bi-LSTM Inference with Structured Sparsity 结构化稀疏Bi-LSTM推理的可配置FPGA加速器
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524784
Shouliang Guo, Chao Fang, Jun Lin, Zhongfeng Wang
{"title":"A Configurable FPGA Accelerator of Bi-LSTM Inference with Structured Sparsity","authors":"Shouliang Guo, Chao Fang, Jun Lin, Zhongfeng Wang","doi":"10.1109/socc49529.2020.9524784","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524784","url":null,"abstract":"To deploy Bi-directional Long Short-Term Memory (Bi-LSTM) on resource-constrained embedded devices, this work presents a configurable FPGA-based Bi-LSTM accelerator enabling structured compression. Firstly, a dense Bi-LSTM model is thoroughly slimed by a hybrid quantization scheme and a structured top-k pruning. Secondly, the energy consumption on external memory access is significantly reduced by the proposed row-reuse computing pattern. Finally, the proposed accelerator is capable of handling a structured sparse Bi-LSTM model benefitting from the algorithm-hardware co-design workflow. It is also flexible to perform inference tasks on Bi-LSTM models with any feature dimension, sequence length, and number of layers. Implemented on the Intel Cyclone V SXC5 SoC FPGA platform, the proposed accelerator can achieve 189.69 GOPs on structured sparse Bi-LSTM networks without batching. Compared with the implementations on CPU and GPU, the low-cost FPGA accelerator achieves 43.5x and 6.3x speedup on latency, 520.9x and 46.5 x improvement on energy efficiency, respectively.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129837089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Secure Your SoC: Building System-an-Chip Designs for Security 保护您的SoC:构建安全的系统芯片设计
2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI: 10.1109/socc49529.2020.9524760
S. Bhasin, Trevor E. Carlson, A. Chattopadhyay, Vinay B. Y. Kumar, A. Mendelson, R. Poussier, Yaswanth Tavva
{"title":"Secure Your SoC: Building System-an-Chip Designs for Security","authors":"S. Bhasin, Trevor E. Carlson, A. Chattopadhyay, Vinay B. Y. Kumar, A. Mendelson, R. Poussier, Yaswanth Tavva","doi":"10.1109/socc49529.2020.9524760","DOIUrl":"https://doi.org/10.1109/socc49529.2020.9524760","url":null,"abstract":"Modern System-on-Chip designs (SoCs) are becoming increasingly complex and powerful, catering to a wide range of application domains. Their use in security-critical tasks calls for a holistic approach to SoC design, including security as a first-class architecture constraint, rather than adding security only as an afterthought. The problem is compounded by the inclusion of multiple, potentially untrusted, third party components in the SoC design. To address this challenge systematically, this paper explores four distinct and important aspects of designing secure SoCs. First, starting at the component level, an evaluation framework for assessing component security against physical attacks is proposed. Second, a scalable simulation framework is developed to integrate these secure components which offers flexibility for early- and late-stage SoC development. Third, dynamic and static techniques are proposed to determine when the system is under attack, with a key focus on Hardware Trojans as threat. Finally, a design strategy for integrating untrusted components into a SoC through hardware Root-of-Trust is outlined. For each of these aspects we present early-stage evaluations, and show how these complement each other towards the design of a secure SoC.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125910662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信