ACM Trans. Design Autom. Electr. Syst.最新文献

High-Level Synthesis Implementation of an Embedded Real-Time HEVC Intra Encoder on FPGA for Media Applications 基于FPGA的媒体应用嵌入式实时HEVC内部编码器的高级综合实现

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-07-31 DOI: 10.1145/3491215

Panu Sjövall, Ari Lemmetti, Jarno Vanne, Sakari Lahti, Timo D. Hämäläinen

{"title":"High-Level Synthesis Implementation of an Embedded Real-Time HEVC Intra Encoder on FPGA for Media Applications","authors":"Panu Sjövall, Ari Lemmetti, Jarno Vanne, Sakari Lahti, Timo D. Hämäläinen","doi":"10.1145/3491215","DOIUrl":"https://doi.org/10.1145/3491215","url":null,"abstract":"High Efficiency Video Coding (HEVC) is the key enabling technology for numerous modern media applications. Overcoming its computational complexity and customizing its rich features for real-time HEVC encoder implementations, calls for automated design methodologies. This article introduces the first complete High-Level Synthesis (HLS) implementation for HEVC intra encoder on FPGA. The C source code of our open-source Kvazaar HEVC encoder is used as a design entry point for HLS that is applied throughout the whole encoder design process, from data-intensive coding tools like intra prediction and discrete transforms to more control-oriented tools such as context-adaptive binary arithmetic coding (CABAC). Our prototype is run on Nokia AirFrame Cloud Server equipped with 2.4 GHz dual 14-core Intel Xeon processors and two Intel Arria 10 PCIe FPGA accelerator cards with 40 Gigabit Ethernet. This proof-of-concept system is designed for hardware-accelerated HEVC encoding and it achieves real-time 4K coding speed up to 120 fps. The coding performance can be easily scaled up by adding practically any number of network-connected FPGA cards to the system. These results indicate that our HLS proposal not only boosts development time, but also provides previously unseen design scalability with competitive performance over the existing FPGA and ASIC encoder implementations.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"154 8 1","pages":"35:1-35:34"},"PeriodicalIF":0.0,"publicationDate":"2022-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83185883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices 利用模拟非易失性突触器件实现高原位训练精度和能量效率

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-07-31 DOI: 10.1145/3500929

Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu

{"title":"Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices","authors":"Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu","doi":"10.1145/3500929","DOIUrl":"https://doi.org/10.1145/3500929","url":null,"abstract":"On-device embedded artificial intelligence prefers the adaptive learning capability when deployed in the field, and thus in situ training is required. The compute-in-memory approach, which exploits the analog computation within the memory array, is a promising solution for deep neural network (DNN) on-chip acceleration. Emerging non-volatile memories are of great interest, serving as analog synapses due to their multilevel programmability. However, the asymmetry and nonlinearity in the conductance tuning remain grand challenges for achieving high in situ training accuracy. In addition, analog-to-digital converters at the edge of the memory array introduce quantization errors. In this work, we present an algorithm-hardware co-optimization to overcome these challenges. We incorporate the device/circuit non-ideal effects into the DNN propagation and weight update steps. By introducing the adaptive “momentum” in the weight update rule, in situ training accuracy on CIFAR-10 could approach its software baseline even under severe asymmetry/nonlinearity and analog-to-digital converter quantization error. The hardware performance of the on-chip training architecture and the overhead for adding “momentum” are also evaluated. By optimizing the backpropagation dataflow, 23.59 TOPS/W training energy efficiency (12× improvement compared to naïve dataflow) is achieved. The circuits that handle “momentum” introduce only 4.2% energy overhead. Our results show great potential and more relaxed requirements that enable emerging non-volatile memories for DNN acceleration on the embedded artificial intelligence platforms.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"121 1","pages":"37:1-37:19"},"PeriodicalIF":0.0,"publicationDate":"2022-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85033125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Efficient Execution Framework of Two-Part Execution Scenario Analysis 两部分执行场景分析的高效执行框架

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-01-31 DOI: 10.1145/3465474

Ding Han, Guohui Li, Quan Zhou, Jianjun Li, Yong Yang, Xiaofei Hu

{"title":"An Efficient Execution Framework of Two-Part Execution Scenario Analysis","authors":"Ding Han, Guohui Li, Quan Zhou, Jianjun Li, Yong Yang, Xiaofei Hu","doi":"10.1145/3465474","DOIUrl":"https://doi.org/10.1145/3465474","url":null,"abstract":"\u0000 Response Time Analysis\u0000 (\u0000 RTA\u0000 ) is an important and promising technique for analyzing the schedulability of real-time tasks under both\u0000 Global Fixed-Priority\u0000 (\u0000 G-FP\u0000 ) scheduling and\u0000 Global Earliest Deadline First\u0000 (\u0000 G-EDF\u0000 ) scheduling. Most existing RTA methods for tasks under global scheduling are dominated by partitioned scheduling, due to the pessimism of the\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 -based interference calculation where\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 is the number of processors. Two-part execution scenario is an effective technique that addresses this pessimism at the cost of efficiency. The major idea of two-part execution scenario is to calculate a more accurate upper bound of the interference by dividing the execution of the target job into two parts and calculating the interference on the target job in each part. This article proposes a novel RTA execution framework that improves two-part execution scenario by reducing some unnecessary calculation, without sacrificing accuracy of the schedulability test. The key observation is that, after the division of the execution of the target job, two-part execution scenario enumerates all possible execution time of the target job in the first part for calculating the final\u0000 Worst-Case Response Time\u0000 (\u0000 WCRT\u0000 ). However, only some special execution time can cause the final result. A set of experiments is conducted to test the performance of the proposed execution framework and the result shows that the proposed execution framework can improve the efficiency of two-part execution scenario analysis by up to\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 in terms of the execution time.\u0000","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"58 1","pages":"3:1-3:24"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83390279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving LDPC Decoding Performance for 3D TLC NAND Flash by LLR Optimization Scheme for Hard and Soft Decision 采用软硬判决LLR优化方案提高3D TLC NAND闪存LDPC解码性能

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-01-31 DOI: 10.1145/3473305

Lanlan Cui, Fei Wu, Xiaojian Liu, Meng Zhang, Renzhi Xiao, Changsheng Xie

{"title":"Improving LDPC Decoding Performance for 3D TLC NAND Flash by LLR Optimization Scheme for Hard and Soft Decision","authors":"Lanlan Cui, Fei Wu, Xiaojian Liu, Meng Zhang, Renzhi Xiao, Changsheng Xie","doi":"10.1145/3473305","DOIUrl":"https://doi.org/10.1145/3473305","url":null,"abstract":"\u0000 Low-density parity-check (LDPC)\u0000 codes have been widely adopted in NAND flash in recent years to enhance data reliability. There are two types of decoding, hard-decision and soft-decision decoding. However, for the two types, their error correction capability degrades due to inaccurate\u0000 log-likelihood ratio (LLR)\u0000 . To improve the LLR accuracy of LDPC decoding, this article proposes LLR optimization schemes, which can be utilized for both hard-decision and soft-decision decoding. First, we build a threshold voltage distribution model for 3D\u0000 floating gate (FG)\u0000 triple level cell (TLC)\u0000 NAND flash. Then, by exploiting the model, we introduce a scheme to quantize LLR during hard-decision and soft-decision decoding. And by amplifying a portion of small LLRs, which is essential in the layer min-sum decoder, more precise LLR can be obtained. For hard-decision decoding, the proposed new modes can significantly improve the decoder’s error correction capability compared with traditional solutions. Soft-decision decoding starts when hard-decision decoding fails. For this part, we study the influence of the reference voltage arrangement of LLR calculation and apply the quantization scheme. The simulation shows that the proposed approach can\u0000 reduce frame error rate (FER)\u0000 for several orders of magnitude.\u0000","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"138 1","pages":"5:1-5:20"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73058984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Comprehensive Survey of Attacks without Physical Access Targeting Hardware Vulnerabilities in IoT/IIoT Devices, and Their Detection Mechanisms 针对物联网/工业物联网设备硬件漏洞的无物理访问攻击及其检测机制的综合调查

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-01-31 DOI: 10.1145/3471936

Nikolaos Foivos Polychronou, Pierre-Henri Thevenon, Maxime Puys, V. Beroulle

{"title":"A Comprehensive Survey of Attacks without Physical Access Targeting Hardware Vulnerabilities in IoT/IIoT Devices, and Their Detection Mechanisms","authors":"Nikolaos Foivos Polychronou, Pierre-Henri Thevenon, Maxime Puys, V. Beroulle","doi":"10.1145/3471936","DOIUrl":"https://doi.org/10.1145/3471936","url":null,"abstract":"With the advances in the field of the Internet of Things (IoT) and Industrial IoT (IIoT), these devices are increasingly used in daily life or industry. To reduce costs related to the time required to develop these devices, security features are usually not considered. This situation creates a major security concern. Many solutions have been proposed to protect IoT/IIoT against various attacks, most of which are based on attacks involving physical access. However, a new class of attacks has emerged targeting hardware vulnerabilities in the micro-architecture that do not require physical access. We present attacks based on micro-architectural hardware vulnerabilities and the side effects they produce in the system. In addition, we present security mechanisms that can be implemented to address some of these attacks. Most of the security mechanisms target a small set of attack vectors or a single specific attack vector. As many attack vectors exist, solutions must be found to protect against a wide variety of threats. This survey aims to inform designers about the side effects related to attacks and detection mechanisms that have been described in the literature. For this purpose, we present two tables listing and classifying the side effects and detection mechanisms based on the given criteria.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"11 1","pages":"1:1-1:35"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72794372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Demand-Driven Multi-Target Sample Preparation on Resource-Constrained Digital Microfluidic Biochips 资源受限的数字微流控生物芯片需求驱动的多靶点样品制备

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-01-31 DOI: 10.1145/3474392

Sudip Poddar, Sukanta Bhattacharjee, Shao-Yun Fang, Tsung-Yi Ho, B. B. Bhattacharya

{"title":"Demand-Driven Multi-Target Sample Preparation on Resource-Constrained Digital Microfluidic Biochips","authors":"Sudip Poddar, Sukanta Bhattacharjee, Shao-Yun Fang, Tsung-Yi Ho, B. B. Bhattacharya","doi":"10.1145/3474392","DOIUrl":"https://doi.org/10.1145/3474392","url":null,"abstract":"\u0000 Microfluidic lab-on-chips offer promising technology for the automation of various biochemical laboratory protocols on a minuscule chip. Sample preparation (SP) is an essential part of any biochemical experiments, which aims to produce dilution of a sample or a mixture of multiple reagents in a certain ratio. One major objective in this area is to prepare dilutions of a given fluid with different concentration factors, each with certain volume, which is referred to as the demand-driven multiple-target (DDMT) generation problem. SP with microfluidic biochips requires proper sequencing of mix-split steps on fluid volumes and needs storage units to save intermediate fluids while producing the desired target ratio. The performance of SP depends on the underlying mixing algorithm and the availability of on-chip storage, and the latter is often limited by the constraints imposed during physical design. Since DDMT involves several target ratios, solving it under storage constraints becomes even harder. Furthermore, reduction of mix-split steps is desirable from the viewpoint of accuracy of SP, as every such step is a potential source of volumetric split error. In this article, we propose a storage-aware DDMT algorithm that reduces the number of mix-split operations on a digital microfluidic lab-on-chip. We also present the layout of the biochip with\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 -storage cells and their allocation technique for\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 . Simulation results reveal the superiority of the proposed method compared to the state-of-the-art multi-target SP algorithms.\u0000","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"236 1","pages":"7:1-7:21"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74185114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures 一种新的多核体系结构的混合缓存一致性与全局窥探

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-01-31 DOI: 10.1145/3462775

G. Harsha, Sujay Deb

{"title":"A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures","authors":"G. Harsha, Sujay Deb","doi":"10.1145/3462775","DOIUrl":"https://doi.org/10.1145/3462775","url":null,"abstract":"\u0000 Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 and runtime by\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 , while requiring\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 smaller storage and\u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 \u0000 lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.\u0000","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"21 1","pages":"2:1-2:31"},"PeriodicalIF":0.0,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84911917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Energy Efficient Error Resilient Multiplier Using Low-power Compressors 使用低功率压缩机的节能容错倍增器

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-01-01 DOI: 10.1145/3488837

S. Deepsita, M. Dhayalakumar, S. Mahammad

引用次数: 1

An Energy-Efficient Inference Method in Convolutional Neural Networks Based on Dynamic Adjustment of the Pruning Level 基于剪枝水平动态调整的卷积神经网络节能推理方法

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2021-08-01 DOI: 10.1145/3460972

M. A. Maleki, Alireza Nabipour-Meybodi, M. Kamal, A. Afzali-Kusha, M. Pedram

{"title":"An Energy-Efficient Inference Method in Convolutional Neural Networks Based on Dynamic Adjustment of the Pruning Level","authors":"M. A. Maleki, Alireza Nabipour-Meybodi, M. Kamal, A. Afzali-Kusha, M. Pedram","doi":"10.1145/3460972","DOIUrl":"https://doi.org/10.1145/3460972","url":null,"abstract":"In this article, we present a low-energy inference method for convolutional neural networks in image classification applications. The lower energy consumption is achieved by using a highly pruned (lower-energy) network if the resulting network can provide a correct output. More specifically, the proposed inference method makes use of two pruned neural networks (NNs), namely mildly and aggressively pruned networks, which are both designed offline. In the system, a third NN makes use of the input data for the online selection of the appropriate pruned network. The third network, for its feature extraction, employs the same convolutional layers as those of the aggressively pruned NN, thereby reducing the overhead of the online management. There is some accuracy loss induced by the proposed method where, for a given level of accuracy, the energy gain of the proposed method is considerably larger than the case of employing any one pruning level. The proposed method is independent of both the pruning method and the network architecture. The efficacy of the proposed inference method is assessed on Eyeriss hardware accelerator platform for some of the state-of-the-art NN architectures. Our studies show that this method may provide, on average, 70% energy reduction compared to the original NN at the cost of about 3% accuracy loss on the CIFAR-10 dataset.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"43 1","pages":"49:1-49:20"},"PeriodicalIF":0.0,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86972178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Machine Learning for Electronic Design Automation: A Survey 电子设计自动化中的机器学习:综述

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2021-01-10 DOI: 10.1145/3451179

Guyue Huang, Jingbo Hu, Yifan He, Jialong Liu, Mingyuan Ma, Zhaoyang Shen, Juejian Wu, Yuanfan Xu, Hengrui Zhang, Kai Zhong, Xuefei Ning, Yuzhe Ma, Haoyu Yang, Bei Yu, Huazhong Yang, Yu Wang

引用次数: 114