IEEE Journal on Emerging and Selected Topics in Circuits and Systems最新文献_第8页

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information 电气和电子工程师学会电路与系统新专题与选题期刊》出版信息

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-09-16 DOI: 10.1109/JETCAS.2024.3450055

引用次数: 0

Guest Editorial Chip and Package-Scale Communication-Aware Architectures for General-Purpose, Domain-Specific, and Quantum Computing Systems 特邀编辑通用、特定领域和量子计算系统的芯片和封装级通信感知架构

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-09-16 DOI: 10.1109/JETCAS.2024.3445208

Abhijit Das;Maurizio Palesi;John Kim;Partha Pratim Pande

{"title":"Guest Editorial Chip and Package-Scale Communication-Aware Architectures for General-Purpose, Domain-Specific, and Quantum Computing Systems","authors":"Abhijit Das;Maurizio Palesi;John Kim;Partha Pratim Pande","doi":"10.1109/JETCAS.2024.3445208","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3445208","url":null,"abstract":"This Special Issue of IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS) is devoted to advancing the field of chip and package-scale communications across diverse computing domains, bridging academic research and industrial innovation. As we enter a new golden age of computer architecture, marked by both challenges and opportunities, the anticipated end of Moore’s law necessitates reimagining the future of computing systems as we approach the physical limits of transistors. Three leading approaches to address these challenges include the chiplet paradigm, domain-specific customization, and quantum computing. However, these architectural and technological innovations have shifted the primary bottleneck from computation to communication. Consequently, on-chip and on-package communication now play a critical role in determining the performance, efficiency, and scalability of general-purpose, domain-specific, and quantum computing systems. Their ever-growing importance has garnered significant attention from both academia and industry.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"349-353"},"PeriodicalIF":3.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10680692","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Information for Authors IEEE 《电路与系统新兴选题》期刊作者须知

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-09-16 DOI: 10.1109/JETCAS.2024.3450053

引用次数: 0

Modeling the Effect of SEUs on the Configuration Memory of SRAM-FPGA-Based CNN Accelerators 模拟 SEU 对基于 SRAM-FPGA 的 CNN 加速器配置存储器的影响

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-09-16 DOI: 10.1109/JETCAS.2024.3460792

Zhen Gao;Jiaqi Feng;Shihui Gao;Qiang Liu;Guangjun Ge;Yu Wang;Pedro Reviriego

{"title":"Modeling the Effect of SEUs on the Configuration Memory of SRAM-FPGA-Based CNN Accelerators","authors":"Zhen Gao;Jiaqi Feng;Shihui Gao;Qiang Liu;Guangjun Ge;Yu Wang;Pedro Reviriego","doi":"10.1109/JETCAS.2024.3460792","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3460792","url":null,"abstract":"Convolutional Neural Networks (CNNs) are widely used in computer vision applications. SRAM based Field Programmable Gate Arrays (SRAM-FPGAs) are popular for the acceleration of CNNs. Since SRAM-FPGAs are prone to soft errors, the reliability evaluation and efficient fault tolerance design become very important for the use of FPGA-based CNNs in safety critical scenarios. Hardware based fault injection is an effective approach for the reliability evaluation, and the results can provide valuable references for the fault tolerance design. However, the complexity of building a fault injection platform poses a big obstacle for researchers working on the fault tolerance design. To remove this obstacle, this paper first performs a complete reliability evaluation for errors on the configuration memory of the FPGA based CNN accelerators, and then studies the impact of errors on the output feature maps of each layer. Based on the statistical analysis, we propose several fault models for the effect of SEUs on the configuration memory of the FPGA based CNN accelerators, and build a software simulator based on the fault models. Experiments show that the evaluation results based on the software simulator are very close to those from the hardware fault injections. Therefore, the proposed fault models and simulator can facilitate the fault tolerance design and reliability evaluation of CNN accelerators.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"799-810"},"PeriodicalIF":3.7,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stealthy Backdoor Attack Against Federated Learning Through Frequency Domain by Backdoor Neuron Constraint and Model Camouflage 利用后门神经元约束和模型伪装，通过频域对联合学习进行隐形后门攻击

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-08-27 DOI: 10.1109/JETCAS.2024.3450527

Yanqi Qiao;Dazhuang Liu;Rui Wang;Kaitai Liang

{"title":"Stealthy Backdoor Attack Against Federated Learning Through Frequency Domain by Backdoor Neuron Constraint and Model Camouflage","authors":"Yanqi Qiao;Dazhuang Liu;Rui Wang;Kaitai Liang","doi":"10.1109/JETCAS.2024.3450527","DOIUrl":"10.1109/JETCAS.2024.3450527","url":null,"abstract":"Federated Learning (FL) is a beneficial decentralized learning approach for preserving the privacy of local datasets of distributed agents. However, the distributed property of FL and untrustworthy data introducing the vulnerability to backdoor attacks. In this attack scenario, an adversary manipulates its local data with a specific trigger and trains a malicious local model to implant the backdoor. During inference, the global model would misbehave for any input with the trigger to the attacker-chosen prediction. Most existing backdoor attacks against FL focus on bypassing defense mechanisms, without considering the inspection of model parameters on the server. These attacks are susceptible to detection through dynamic clustering based on model parameter similarity. Besides, current methods provide limited imperceptibility of their trigger in the spatial domain. To address these limitations, we propose a stealthy backdoor attack called “Chironex” against FL with an imperceptible trigger in frequency space to deliver attack effectiveness, stealthiness and robustness against various countermeasures on FL. We first design a frequency trigger function to generate an imperceptible frequency trigger to evade human inspection. Then we fully exploit the attacker’s advantage to enhance attack robustness by estimating benign updates and analyzing the impact of the backdoor on model parameters through a task-sensitive neuron searcher. It disguises malicious updates as benign ones by reducing the impact of backdoor neurons that greatly contribute to the backdoor task based on activation value, and encouraging them to update towards benign model parameters trained by the attacker. We conduct extensive experiments on various image classifiers with real-world datasets to provide empirical evidence that Chironex can evade the most recent robust FL aggregation algorithms, and further achieve a distinctly higher attack success rate than existing attacks, without undermining the utility of the global model.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"661-672"},"PeriodicalIF":3.7,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Chip and Package-Scale Interconnects for General-Purpose, Domain-Specific, and Quantum Computing Systems—Overview, Challenges, and Opportunities 通用、特定领域和量子计算系统的芯片和封装级互连 - 概述、挑战和机遇

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-08-19 DOI: 10.1109/JETCAS.2024.3445829

Abhijit Das;Maurizio Palesi;John Kim;Partha Pratim Pande

{"title":"Chip and Package-Scale Interconnects for General-Purpose, Domain-Specific, and Quantum Computing Systems—Overview, Challenges, and Opportunities","authors":"Abhijit Das;Maurizio Palesi;John Kim;Partha Pratim Pande","doi":"10.1109/JETCAS.2024.3445829","DOIUrl":"10.1109/JETCAS.2024.3445829","url":null,"abstract":"The anticipated end of Moore’s law, coupled with the breakdown of Dennard scaling, compelled everyone to conceive forthcoming computing systems once transistors reach their limits. Three leading approaches to circumvent this situation are the chiplet paradigm, domain customisation and quantum computing. However, architectural and technological innovations have shifted the fundamental bottleneck from computation to communication. Hence, on-chip and on-package communication play a pivotal role in determining the performance, energy efficiency and scalability of general-purpose, domain-specific and quantum computing systems. This article reviews the recent advances in chip and package-scale interconnects due to the change in architecture, application and technology. The primary objective of this article is to present the current status, key challenges, and impact-worthy opportunities in this research area from the perspective of hardware architectures. The secondary objective of this article is to serve as a tutorial providing an overview of academic and industrial explorations in chip and package-scale communication infrastructure design for general-purpose, domain-specific and quantum computing systems.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"354-370"},"PeriodicalIF":3.7,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10638543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142197237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GNS: Graph-Based Network-on-Chip Shield for Early Defense Against Malicious Nodes in MPSoC GNS：基于图的片上网络屏蔽，用于早期防御 MPSoC 中的恶意节点

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-08-05 DOI: 10.1109/JETCAS.2024.3438435

Haoyu Wang;Jianjie Ren;Basel Halak;Ahmad Atamli

{"title":"GNS: Graph-Based Network-on-Chip Shield for Early Defense Against Malicious Nodes in MPSoC","authors":"Haoyu Wang;Jianjie Ren;Basel Halak;Ahmad Atamli","doi":"10.1109/JETCAS.2024.3438435","DOIUrl":"10.1109/JETCAS.2024.3438435","url":null,"abstract":"In the rapidly evolving landscape of system design, Multi-Processor Systems-on-Chip (MPSoCs) have experienced significant growth in both scale and complexity, by integrating an array of Intellectual Properties (IPs) through Network-on-Chip (NoC) to execute complex parallel applications. However, this advancement has led to the emergence of security attacks caused by Malicious Third-Party IPs (M3PIPs), such as Denial-of-Service (DoS). Many current methods for detecting DoS attacks involve significant hardware overhead and are often inefficient in identifying anomalies at an early stage. Addressing this gap, we propose the Graph-based NoC Shield (GNS), a robust strategy meticulously crafted to detect, localize, and isolate malicious IPs at the very early stage of DoS appearance. Central to our approach is the use of a Graph Neural Network (GNN) and Long Short-Term Memory (LSTM) detection model. This combination capitalizes on network traffic data and routing dependency graphs to efficiently trace the source of network congestion and pinpoint attackers. Our extensive experimental analysis validates the effectiveness of the GNS framework, demonstrating a 98% detection accuracy and localization capabilities, achieved with minimal hardware overhead of 1.8% in each router, based on a pure 4*4 Mesh NoC system. The detection performance exceeds that of all other state-of-the-art works and most straightforward single machine learning inference models within the same context. Additionally, the hardware overhead is notably superior compared to other security schemes. Another key feature of our system is the implementation of a credit interposing mechanism. It was specifically designed to isolate M3PIPs engaging in Flooding-based DoS and effectively mitigate the spread of malicious traffic. This approach significantly enhances the security of NoC-based MPSoCs, offering early-stage detection with the superior accuracy compared to other models. Crucially, the GNS achieves this with up to 75% less hardware overhead than state-of-the-art solutions, thus striking a balance between efficiency and effectiveness in security implementation.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"483-494"},"PeriodicalIF":3.7,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating Register Cache Behavior: Implications for CUDA and Tensor Core Workloads on GPUs 研究寄存器缓存行为：对 GPU 上 CUDA 和张量核心工作负载的影响

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-08-05 DOI: 10.1109/JETCAS.2024.3439193

Vahid Geraeinejad;Qiran Qian;Masoumeh Ebrahimi

{"title":"Investigating Register Cache Behavior: Implications for CUDA and Tensor Core Workloads on GPUs","authors":"Vahid Geraeinejad;Qiran Qian;Masoumeh Ebrahimi","doi":"10.1109/JETCAS.2024.3439193","DOIUrl":"10.1109/JETCAS.2024.3439193","url":null,"abstract":"GPUs are extensively employed as the primary devices for running a broad spectrum of applications, covering general-purpose applications as well as Artificial Intelligence (AI) applications. Register file, as the largest SRAM on the GPU die, accounts for over 20% of the total GPU energy consumption. Register cache has been introduced to reduce traffic from the register file and thus decrease total energy consumption when CUDA cores are utilized. However, the utilization of register cache has not been thoroughly investigated for Tensor Cores which are integrated into recent GPU architectures to meet AI workload demands. In this paper, we study the usage of register cache in both CUDA and Tensor Cores and conduct a thorough examination of their pros and cons. We have developed an open-source analytical simulator, called RFC-sim, to model and measure the energy consumption of both the register file and register cache. Our results show that while the register cache can reduce energy consumption by up to 40% in CUDA cores, it results in increased energy consumption by up to 23% in Tensor Cores. The main reason lies in the limited space of the register cache, which is not sufficient for the demand of Tensor cores to capture locality.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"469-482"},"PeriodicalIF":3.7,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamic Fault Tolerance Approach for Network-on-Chip Architecture 片上网络架构的动态容错方法

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-08-05 DOI: 10.1109/JETCAS.2024.3438250

Kasem Khalil;Ashok Kumar;Magdy Bayoumi

{"title":"Dynamic Fault Tolerance Approach for Network-on-Chip Architecture","authors":"Kasem Khalil;Ashok Kumar;Magdy Bayoumi","doi":"10.1109/JETCAS.2024.3438250","DOIUrl":"10.1109/JETCAS.2024.3438250","url":null,"abstract":"Network-on-Chip (NoC) architecture provides speed-efficient and scalable communication in complex integrated circuits. Attaining fault tolerance in NoC architectures is an ongoing research problem aiming to enhance the architecture’s reliability and performance. It seeks to mitigate the impact of router failures and enhance the overall system robustness. Fault tolerance is achieved by adding additional hardware, and the research challenge is to attain high reliability, high Mean Time To Failure (MTTF), and low Energy-Delay-Product (EDP) while sacrificing an acceptable area. It is particularly vital for applications with uninterrupted data flow. This paper proposes a fault-tolerance approach for NoC systems focusing on NoC routers to yield increased reliability and MTTF with an acceptable area overhead and low EDP. The proposed method proposes a dynamic reconfiguration mechanism by using a dynamic allocation of virtual channels and a bypass crossbar mechanism, ensuring uninterrupted data flow within the NoC. Evaluations of the proposed method are done on different mesh sizes using VHDL and an Altera 10GX FPGA, demonstrating the method’s superiority in reliability, reduced latency, and enhanced throughput. The results show that the proposed method has an acceptable area overhead of 25.3%, and its MTTF values are 3.7 times to 18 times higher than the traditional methods for varying network sizes, showing remarkable robustness against faults. The results show that the proposed method attains the best-reported reliability with the least EDP. Additionally, a layout of the circuit is also created and studied.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"384-394"},"PeriodicalIF":3.7,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Communication-Aware and Resource-Efficient NoC-Based Architecture for CNN Acceleration 基于 NoC 的通信感知和资源高效的 CNN 加速体系结构

IF 3.7 2区工程技术

IEEE Journal on Emerging and Selected Topics in Circuits and Systems Pub Date : 2024-08-02 DOI: 10.1109/JETCAS.2024.3437408

Huidong Ji;Chen Ding;Boming Huang;Yuxiang Huan;Li-Rong Zheng;Zhuo Zou

{"title":"Communication-Aware and Resource-Efficient NoC-Based Architecture for CNN Acceleration","authors":"Huidong Ji;Chen Ding;Boming Huang;Yuxiang Huan;Li-Rong Zheng;Zhuo Zou","doi":"10.1109/JETCAS.2024.3437408","DOIUrl":"10.1109/JETCAS.2024.3437408","url":null,"abstract":"Exploding development of convolutional neural network (CNN) benefits greatly from the hardware-based acceleration to maintain low latency and high utilization of resources. To enhance the processing efficiency of CNN algorithms, Field Programming Gate Array (FPGA)-based accelerators are designed with increased hardware resources to achieve high parallelism and throughput. However, there exist bottlenecks when more processing elements (PEs) in the form of PE clusters are introduced, including 1) the under-utilization of FPGA’s fixed hardware resources, which leads to the effective and peak performance mismatch; and 2) the limited clock frequency caused by the sophisticated routing and complex placement. In this paper, a 2-level hierarchical Network-on-Chip (NoC)-based CNN accelerator is proposed. In the upper level, a mesh-based NoC that interconnects multiple PE clusters is introduced. Such a design not only provides increased flexibility to balance different data communication models for better PE utilization and energy efficiency but also enables globally asynchronous, locally synchronous (GALS) architecture for better timing closure. At the lower level, local PEs are organized into a 3D-tiled PE cluster aiming to maximize the data reuse exploiting inherent dataflow of the convolution networks. Implementation and experiments on Xilinx ZU9EG FPGA for 4 benchmark CNN models: ResNet50, ResNet34, VGG16, and Darknet19 show that our work operates at a frequency of 300 MHz and delivers an effective throughput of 0.998 TOPS, 1.022 TOPS, 1.024 TOPS, and 1.026 TOPS. This result corresponds to 92.85%, 95.1%, 95.25%, and 95.46% PE utilization. Compared with the related FPGA-based designs, our work improves the resource efficiency of DSP by \u0000<inline-formula> <tex-math>$5.36times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$1.62times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$1.96times $ </tex-math></inline-formula>\u0000, and \u0000<inline-formula> <tex-math>$5.83times $ </tex-math></inline-formula>\u0000, respectively.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"440-454"},"PeriodicalIF":3.7,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0