{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information","authors":"","doi":"10.1109/TCAD.2025.3566794","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3566794","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"C3-C3"},"PeriodicalIF":2.7,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11007761","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DBB-ECC: Random Double Bit and Burst Error Correction Code for HBM3","authors":"Chaehyeon Shin;Jongsun Park","doi":"10.1109/TCAD.2025.3544964","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3544964","url":null,"abstract":"As dynamic random access memory (DRAM) technology continues to scale down, DRAM vendors have adopted on-die error correction codes (on-die ECC) to address reliability problems caused by cell failures. For burst error correction, a single symbol correction (SSC) Reed-Solomon (RS) code is utilized in high bandwidth memory (HBM) 3. However, randomly scattered errors frequently occur with aggressive technology scaling, which necessitates more robust error correction codes (ECC) scheme that addresses both burst errors and scattered errors. This brief presents double bit and burst ECC (DBB-ECC), an efficient scheme designed to correct both single symbol errors and random double bit errors with reduced implementation overhead. In the proposed decoding, syndromes based on SSC RS codes are used to address both error types without increasing parity bits. The decoder complexity has been also reduced by exploiting the syndrome patterns of double bit errors. The experimental results show that the proposed solution needs lower implementation overhead than conventional ones while maintaining same level of correction capability. Compared to the conventional SSC code, it also significantly enhances HBM3 reliability without increasing storage overhead.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3236-3240"},"PeriodicalIF":2.7,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems publication information","authors":"","doi":"10.1109/TCAD.2025.3534398","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3534398","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 3","pages":"1209-1209"},"PeriodicalIF":2.7,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10896928","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems society information","authors":"","doi":"10.1109/TCAD.2025.3537647","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3537647","url":null,"abstract":"","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 3","pages":"C2-C2"},"PeriodicalIF":2.7,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10896933","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongan Zhang;Yuecheng Li;Syed Shakib Sarwar;H. Ekin Sumbul;Yonggan Fu;Haoran You;Cheng Wan;Yingyan Lin
{"title":"Re-CATA: Real-Time and Flexible Accelerator Design Framework for On-Device Codec Avatars","authors":"Yongan Zhang;Yuecheng Li;Syed Shakib Sarwar;H. Ekin Sumbul;Yonggan Fu;Haoran You;Cheng Wan;Yingyan Lin","doi":"10.1109/TCAD.2025.3539600","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539600","url":null,"abstract":"Real-time Codec Avatars, which employ deep generative models for 3-D reconstruction of human features, are crucial for immersive telepresence in augmented reality and virtual reality (AR/VR) environments. However, deploying these avatars in real-time on AR/VR headsets is challenging due to the inability of existing devices to achieve satisfying performance within stringent hardware resource constraints. To address these challenges, we introduce Re-CATA, an innovative full-stack and flexible Codec Avatar accelerator design framework. Re-CATA is designed to deliver real-time throughput (greater than 120 FPS) for the complete Codec Avatar processing pipeline within an edge-level power budget of 5 W under FPGA prototyping. Our approach begins by abstracting the operation mapping and scheduling challenges inherent in Codec Avatars, which require both centralized and distributed processing to handle dynamically changing workloads. We propose a novel hardware resource and workload partitioning scheme optimized for these fluctuating demands. To complement this, we introduce an agile runtime scheduling system for efficient workload reallocation among computing units as needed, recognizing the limitations of static partitioning in rapidly evolving workload scenarios. Furthermore, our micro-architecture design incorporates unified computing modules and efficient hardware peripherals, enabling seamless workload balancing across the Codec Avatar processing pipeline. We evaluate the Re-CATA accelerators via on-board FPGA prototyping, comparing them to various baselines, including commercial AR/VR system-on-chips and academic accelerators. This evaluation demonstrates a maximum speedup of up to <inline-formula> <tex-math>$5.95times $ </tex-math></inline-formula> under similar settings.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3020-3033"},"PeriodicalIF":2.7,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Single-Pass: An Operation Unit-Based In-Memory Computing Architecture for Sparse Neural Networks","authors":"Shang Wang;Qi Cao;Yongqiang Wang;Hang Chen;Zhenjiao Chen;Feng Liang","doi":"10.1109/TCAD.2025.3539592","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539592","url":null,"abstract":"Compute-in-memory (CIM) has emerged as a prominent research focus in recent years, offering a promising alternative for advancing traditional von Neumann architecture computers. However, the extensive array structures and peripheral circuits inherent in CIM introduce challenges related to latency and power consumption. The operation unit (OU) has gained attention as a practical solution to these issues, significantly transforming the computational paradigm of in-memory computing. Despite its potential, the possibilities enabled by this approach remain underexplored. This article presents a novel architecture, single-pass, designed around OU implementation with a new OU partitioning method optimized for sparse networks. Additionally, we propose a matrix compression technique leveraging a dual heuristic greedy algorithm (DHGA), forming the foundation of our architecture-specific mapping strategy. Experimental results demonstrate that, within given area constraints, our architecture achieves an average energy efficiency improvement of 29.8% and a speedup of 82.3% across various networks compared to the baseline.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2952-2965"},"PeriodicalIF":2.7,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shallow Quantum Circuit Implementation of Symmetric Functions With Limited Ancillary Qubits","authors":"Wei Zi;Junhong Nie;Xiaoming Sun","doi":"10.1109/TCAD.2025.3539002","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539002","url":null,"abstract":"Optimizing the depth and number of ancillary qubits in quantum circuits is crucial in quantum computation, given the limitations imposed by current quantum devices. In this article, we introduce an innovative approach for implementing arbitrary symmetric Boolean functions using poly-logarithmic depth quantum circuits with only a logarithmic number of ancillary qubits. Symmetric functions are those whose outputs are dictated solely by the Hamming weight of the inputs. These functions find applications across various domains, including quantum machine learning and arithmetic circuit synthesis. Moreover, by fully leveraging the potential of qutrits, the ancilla count can be further reduced to just one. The key technique involves a novel poly-logarithmic depth quantum circuit designed to compute Hamming weight without the need for ancillary qubits. This quantum circuit for Hamming weight is of independent interest due to its wide-ranging applications, such as in quantum memory, quantum machine learning, and Hamiltonian dynamics simulations.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3060-3072"},"PeriodicalIF":2.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaehyuk Lim;Donghwan Han;Juho Sung;Seokchan Yoon;Sanghyun Kang;Gwon Kim;Hyoung Won Baac;Changhwan Shin
{"title":"Device Design Guidelines to Boost Up AC Performance of CFET (Complementary Field-Effect-Transistor)-Based Inverter","authors":"Jaehyuk Lim;Donghwan Han;Juho Sung;Seokchan Yoon;Sanghyun Kang;Gwon Kim;Hyoung Won Baac;Changhwan Shin","doi":"10.1109/TCAD.2025.3539599","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539599","url":null,"abstract":"Complementary field-effect transistors (CFETs) have emerged as promising candidates for next-generation semiconductor devices. CFETs feature a structure with an nMOS (or pMOS) transistor at the bottom and a transistor of the opposite type at the top. CFETs can be classified into Fin-CFETs or GAA-CFETs based on their channel structure. In this study, we compare and analyze these two devices to determine which structure is more favorable for device scaling and which device exhibits better performance per unit area. For a reliable analysis, the threshold voltage was adjusted to be the same for all devices. Initially, to compare the DC performance, the on-state drive currents in both linear mode and saturation mode operations were extracted and compared from the <inline-formula> <tex-math>$I_{mathrm { DS}}$ </tex-math></inline-formula>-versus-<inline-formula> <tex-math>$V_{mathrm { GS}}$ </tex-math></inline-formula> input-transfer characteristics. Subsequently, complementary metal-oxide-semiconductor inverters were constructed to compare their AC performance. Six parameters were extracted and compared: high-to-low propagation delay (<inline-formula> <tex-math>$t_{pLH}$ </tex-math></inline-formula>), falling time (<inline-formula> <tex-math>$t_{f}$ </tex-math></inline-formula>), low-to-high propagation delay (<inline-formula> <tex-math>$t_{pLH}$ </tex-math></inline-formula>), rising time (<inline-formula> <tex-math>$t_{r}$ </tex-math></inline-formula>), overshoot voltage (<inline-formula> <tex-math>$V_{ov}$ </tex-math></inline-formula>), and undershoot voltage (<inline-formula> <tex-math>$V_{und}$ </tex-math></inline-formula>). Based on the results, we suggest which CFET structure is more suitable for device scaling.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3189-3196"},"PeriodicalIF":2.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongyi Li;Qingyuan Yang;Songchen Ma;Rong Zhao;Xinglong Ji
{"title":"RoboSpike: Fully Utilizing the Heterogeneous System With Subcallback Scheduling in ROS 2","authors":"Hongyi Li;Qingyuan Yang;Songchen Ma;Rong Zhao;Xinglong Ji","doi":"10.1109/TCAD.2025.3538615","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3538615","url":null,"abstract":"The advancement in artificial intelligence (AI) has greatly propelled the development of robotics, requiring the adoption of heterogeneous computing architectures with multicore CPUs, GPUs, and accelerators to meet the growing computational needs of edge computing. Such heterogeneity, coupled with the inherently IO-intensive nature of robotic applications, poses substantial challenges for task scheduling and resource management. These challenges are particularly acute for systems striving to maximize computational resource utilization, which cannot be effectively addressed through callback-level scheduling. To overcome these obstacles, we developed RoboSpike, a systematic solution built on the Robot Operating System 2 (ROS 2). We first implemented a subcallback scheduling mechanism utilizing coroutines to utilize the blocked CPUs which wait for I/O operations. Building on this mechanism, we extended the design to incorporate the coprocessor and introduced an auto-tuning algorithm to adapt to system performance variations. Finally, we performed the response time analysis to ensure that the RoboSpike is predictable in time. The evaluation results demonstrate that RoboSpike achieves substantial improvements, increasing throughput by 1.65–2.25 times in real-world scenarios. RoboSpike enhances the scheduling capabilities of ROS 2 by refining the granularity from the callback level, thus opening up new opportunities for performance improvement in robotic systems, especially in resource-limited scenarios with complex workloads.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2897-2910"},"PeriodicalIF":2.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Coexisting Hyperchaos in a Memristive Neuromorphic Oscillator","authors":"Xin Zhang;Chunbiao Li;Tengfei Lei;Herbert Ho-Ching Iu;Tomasz Kapitaniak","doi":"10.1109/TCAD.2025.3538692","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3538692","url":null,"abstract":"Memristors have been widely integrated into neurons as the bridge for introducing external magnetic induction currents. The complex oscillation induced by the external magnetic stimulation is a hot topic in neuron dynamics. When a memristor is introduced into the Hindmarsh-Rose (HR) neuron to simulate the external magnetic field, a novel memristive neuromorphic hyperchaotic oscillator is constructed. The memristor weight can trigger complex neuronal firing dynamics, including the rare hyperchaotic bursting. Furthermore, when the technology of offset boosting-oriented attractor doubling is employed, a double-scroll hyperchaotic attractor can be generated, which could split into three independent coexisting attractors under some specific offsets. More interesting, two symmetric periodic attractors and two symmetric hyperchaotic attractors can coexist under certain conditions. In this work, a neuron with coexisting hyperchaotic attractors is constructed and exhaustively explored, which provides a good candidate for constituting large-scale brain-like neuromorphic oscillator. A PCB-based hardware circuit produces the oscillations validating the numerical simulations and theoretical analyses.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3179-3188"},"PeriodicalIF":2.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}