{"title":"OpenPiton4HPC: Optimizing OpenPiton Toward High-Performance Manycores","authors":"Neiel Leyva;Alireza Monemi;Noelia Oliete-Escuín;Guillem López-Paradís;Xabier Abancens;Jonathan Balkind;Enrique Vallejo;Miquel Moretó;Lluc Alvarez","doi":"10.1109/JETCAS.2024.3428929","DOIUrl":"10.1109/JETCAS.2024.3428929","url":null,"abstract":"In recent years, numerous multicore RISC-V platforms have emerged. Development frameworks such as OpenPiton are employed in designs that aim to scale to a large number of cores. While OpenPiton presents a large flexibility, supporting different requirements and processing cores, some of its design decisions result in designs that are not optimized for High-Performance Computing (HPC) requirements. This work presents OpenPiton4HPC, an extension and optimization of OpenPiton for high-performance manycores. The key contributions are enabling multiple memory controllers, supporting router bypassing and NoC concentration, adding support for configurable cache sizes and cache block sizes, and allowing configurable bus widths in the NoC and in the cache SRAMs. On a 64-core manycore architecture, these new features and optimizations provide a geometric mean speedup of 7.2x compared to the OpenPiton baseline.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"395-408"},"PeriodicalIF":3.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum Cryptanalysis of Affine Cipher","authors":"Mahima Mary Mathews;Panchami V;Vishnu Ajith","doi":"10.1109/JETCAS.2024.3428436","DOIUrl":"10.1109/JETCAS.2024.3428436","url":null,"abstract":"Quantum Algorithms reduce the computational complexity or solve certain difficult problems that were originally impossible to solve with classical computers. Grover’s search algorithm is a Quantum computation algorithm that can find target elements from a set of unstructured data with the best possible, \u0000<inline-formula> <tex-math>$O(sqrt {N})$ </tex-math></inline-formula>\u0000 queries. Grover’s search Quantum circuits implemented accurately can be used to successfully search and find the keys of Symmetric ciphers. However, very few demonstrations of such practical cryptanalysis are available. In this paper, practical Quantum cryptanalysis circuits for Affine Cipher are proposed and demonstrated, that successfully break the cipher by finding the keys.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"507-519"},"PeriodicalIF":3.7,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141720516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhi Jaiswal;K. C. Sharin Shahana;Sujitha Ravichandran;K. Adarsh;H. Bharath Bhat;Biresh Kumar Joardar;Sumit K. Mandal
{"title":"HALO: Communication-Aware Heterogeneous 2.5-D System for Energy-Efficient LLM Execution at Edge","authors":"Abhi Jaiswal;K. C. Sharin Shahana;Sujitha Ravichandran;K. Adarsh;H. Bharath Bhat;Biresh Kumar Joardar;Sumit K. Mandal","doi":"10.1109/JETCAS.2024.3427421","DOIUrl":"10.1109/JETCAS.2024.3427421","url":null,"abstract":"Large Language Models (LLMs) are used to perform various tasks, especially in the domain of natural language processing (NLP). State-of-the-art LLMs consist of a large number of parameters that necessitate a high volume of computations. Currently, GPUs are the preferred choice of hardware platform to execute LLM inference. However, monolithic GPU-based systems executing large LLMs pose significant drawbacks in terms of fabrication cost and energy efficiency. In this work, we propose a heterogeneous 2.5D chiplet-based architecture for accelerating LLM inference. The proposed 2.5D system consists of heterogeneous chiplets connected via a network-on-package (NoP). In the proposed 2.5D system, we leverage the energy efficiency of in-memory computing (IMC) and the general-purpose computing capability of CMOS-based floating point units (FPUs). The 2.5D technology helps to integrate two different technologies (IMC and CMOS) on the same system. Due to a large number of parameters, communication between chiplets becomes a significant performance bottleneck if not optimized while executing LLMs. To this end, we propose a communication-aware scalable technique to map different pieces of computations of an LLM onto different chiplets. The proposed mapping technique minimizes the communication energy and latency over the NoP, and is significantly faster than existing optimization techniques. Thorough experimental evaluations with a wide variety of LLMs show that the proposed 2.5D system provides up to \u0000<inline-formula> <tex-math>$972times $ </tex-math></inline-formula>\u0000 improvement in latency and \u0000<inline-formula> <tex-math>$1600times $ </tex-math></inline-formula>\u0000 improvement in energy consumption with respect to state-of-the-art edge devices equipped with GPU.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"425-439"},"PeriodicalIF":3.7,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141610590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Fidelity-Oriented Entanglement Distribution for Quantum Switches","authors":"Ziyue Jia;Lin Chen","doi":"10.1109/JETCAS.2024.3425712","DOIUrl":"10.1109/JETCAS.2024.3425712","url":null,"abstract":"We consider a star-shaped quantum network with a quantum switch in the center serving a number of requests, each characterized by two non-classical QoS requirements, the end-to-end entanglement delivery rate and the fidelity of the delivered entanglements. The central task of the switch is to allocate the limited entanglement resources among requests to maximize the system performance. We formulate the fundamental entanglement distribution problem where the switch decides 1) which requests to admit, and 2) as multiple requests may share a same quantum link, how to distributed the limited link-level entanglement resources among those competing requests. We then design a framework of joint entanglement purification scheduling and distribution for quantum switches. Our entanglement purification scheduling algorithm seeks to use minimal link-level entanglement resources to satisfy the QoS requirement of a single request. Our entanglement distribution algorithm further allocates the limited entanglement resources among multiple requests to maximize the overall utility by integrating the designed entanglement purification scheduling algorithm. We establish theoretical performance guarantee of our proposition, which is complemented by extensive numerical experiments demonstrating its effectiveness in a variety of network settings.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"495-506"},"PeriodicalIF":3.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Delay-Constrained GNR Routing With CNT-Via Insertion in Nano-Scale Designs","authors":"Jin-Tai Yan","doi":"10.1109/JETCAS.2024.3424217","DOIUrl":"10.1109/JETCAS.2024.3424217","url":null,"abstract":"It is well known that graphene nanoribbon (GNR) can be used as interconnects in nano-scale designs. In this paper, given a set of delay-constrained GNR nets in a multiple-layer routing plane, based on the construction of a combined carbon nanotube (CNT)/graphene hetero-structure for CNT-vias between two adjacent layers, an efficient routing algorithm can be proposed to minimize the number of the used layers with satisfying the non-crossing constraints between two GNR nets and the delay constraints on the GNR nets in GNR routing with CNT-via insertion. In the initial assignment, based on the definition of the delay-constrained routing pattern on a GNR net with tight delay constraint and the delay-constrained via path on a GNR net, the delay-constrained routing patterns can be firstly assigned for layer minimization and the delay-driven minimum-length routing paths and the delay-constrained via paths can be further assigned onto the available layers. In the iterative routing, the unrouted GNR nets can be further routed on the available layers and some possible new layers by using one iterative maze-routing and rip-up-and-rerouting process. Compared with the published routing algorithms with no via insertion, the experimental results show that our proposed routing algorithm with CNT-via insertion can insert some CNT-vias and use shorter wirelength to decrease 53.8% and 24.9% of the number of the used layer under reasonable CPU time on the given GNR nets with two different sets of the delay constraints for 8 tested examples on the average, respectively.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"371-383"},"PeriodicalIF":3.7,"publicationDate":"2024-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Efficient and Rotationally Adjustable Millimeter-Wave Wireless Interconnects","authors":"Abhishek Sharma;Yanghyo Rod Kim","doi":"10.1109/JETCAS.2024.3422371","DOIUrl":"10.1109/JETCAS.2024.3422371","url":null,"abstract":"Conventional interconnects experience significant mechanical durability, mobility, and signal integrity challenges when dealing with moving parts or implementing extensive interconnect networks. As a result, they often hinder the performance of advanced autonomous and high-performance computing systems. This paper presents a fully rotatable and diagonally flexible ultra-short distance (≈ 1 mm) wireless interconnect. The proposed wireless interconnect comprises a 57-GHz transceiver integrated with a folded dipole antenna through wire bonding, enabling a flexible contactless connection. Here, two folded dipoles communicate in the Fresnel zone (radiative near-field), where we leverage the longitudinal electric fields to alleviate the polarization mismatch over the entire rotation angle. We have implemented a non-coherent on-off keying (OOK) modulation scheme and employed an automatic gain control (AGC) loop and offset canceling feedback loop to compensate for the transmission degradation and signal imbalance. The proposed system consumes 58.2 mW of power under a 1 V supply while transferring data at a rate of 10-Gb/s, achieving 5.82-pJ/bit energy efficiency.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"551-562"},"PeriodicalIF":3.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141548575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jicheon Kim;Chunmyung Park;Eunjae Hyun;Xuan Truong Nguyen;Hyuk-Jae Lee
{"title":"A Highly-Scalable Deep-Learning Accelerator With a Cost-Effective Chip-to-Chip Adapter and a C2C-Communication-Aware Scheduler","authors":"Jicheon Kim;Chunmyung Park;Eunjae Hyun;Xuan Truong Nguyen;Hyuk-Jae Lee","doi":"10.1109/JETCAS.2024.3421553","DOIUrl":"10.1109/JETCAS.2024.3421553","url":null,"abstract":"Multi-chip-module (MCM) technology heralds a new era for scalable DNN inference systems, offering a cost-effective alternative to large-scale monolithic designs by lowering fabrication and design costs. Nevertheless, MCMs often incur resource and performance overheads due to inter-chip communication, which largely reduce a performance gain in a scaling-out system. To address these challenges, this paper introduces a highly-scalable DNN accelerator with a lightweight chip-to-chip adapter (C2CA) and a C2C-communication-aware scheduler. Our design employs a C2CA for inter-chip communication, which accurately illustrates an MCM system with a constrained C2C bandwidth, e.g., about 1/16, 1/8, or 1/4 of an on-chip bandwidth. We empirically reveal that the limited C2C bandwidth largely affects the overall performance gain of an MCM system. For example, compared with the one-core engine, a four-chip MCM system with a constrained C2C bandwidth only achieves \u0000<inline-formula> <tex-math>$2.60times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$3.27times $ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$2.84times $ </tex-math></inline-formula>\u0000, and \u0000<inline-formula> <tex-math>$2.74times $ </tex-math></inline-formula>\u0000 performance gains on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS, respectively. Mitigating the problem, we propose a novel C2C-communication-aware scheduler with forward and backward inter-layer scheduling. Specifically, our scheduler effectively utilizes a C2C bandwidth while a core is performing its own computation. To demonstrate the effectiveness and practicality of our concept, we modeled our design with Verilog HDL and implemented it on an FPGA board, i.e., Xilinx ZCU104. The experimental results demonstrate that the system shows significant throughput improvements compared to a single-chip configuration, yielding average enhancements of \u0000<inline-formula> <tex-math>$1.87times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$3.43times $ </tex-math></inline-formula>\u0000 for two-chip and four-chip configurations, respectively, on ResNet50, DarkNet19, MobileNetV1, and EfficientNetS.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 3","pages":"455-468"},"PeriodicalIF":3.7,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141522295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Secure Consensus Control for Constrained Multi-Agent Systems Against Intermittent Denial-of-Service Attacks: An Adaptive Dynamic Programming Method","authors":"Zhen Gao;Ning Zhao;Guangdeng Zong;Xudong Zhao","doi":"10.1109/JETCAS.2024.3420396","DOIUrl":"10.1109/JETCAS.2024.3420396","url":null,"abstract":"Combining the use of the adaptive dynamic programming method and optimized backstepping strategy, this paper focuses on the secure consensus problem for constrained nonlinear multi-agent systems (MASs) subject to denial-of-service (DoS) attacks and input delay. Since network channels between some agents often suffer from intrusions by attackers during data transmission, we consider information transfers in both attack-sleep and attack-active scenarios, and construct a novel distributed observer with a switched mechanism to estimate the leader’s state information. In order to optimize system performances while ensuring that the system states do not exceed constraint sets, a new performance index function and a tan-type barrier Lyapunov function (BLF) are introduced. Besides, by employing the Pade approximation and an intermediate variable, the effect of input delay is removed. As a consequence, the proposed optimal control can smoothly steer the nonlinear MASs to realize the followers-leader consensus tracking goal, and all system states are consistently constrained within their compact sets. Finally, simulation results verify the effectiveness of this control scheme.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 4","pages":"705-716"},"PeriodicalIF":3.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10577119","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems information for authors","authors":"","doi":"10.1109/JETCAS.2024.3417549","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3417549","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"348-348"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579095","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141494812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems Publication Information","authors":"","doi":"10.1109/JETCAS.2024.3405090","DOIUrl":"https://doi.org/10.1109/JETCAS.2024.3405090","url":null,"abstract":"","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"14 2","pages":"C2-C2"},"PeriodicalIF":3.7,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10579073","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141495163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}