IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献_第8页

HALTRAV: Design of a High-Performance and Area-Efficient Latch With Triple-Node-Upset Recovery and Algorithm-Based Verifications HALTRAV：一种高性能和区域高效锁存器的设计，具有三节点破坏恢复和基于算法的验证

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-12-04 DOI: 10.1109/TCAD.2024.3511335

Xing Guo;Jiajia Zhang;Xu Meng;Zhenmin Li;Xiaoqing Wen;Patrick Girard;Bin Liang;Aibin Yan

{"title":"HALTRAV: Design of a High-Performance and Area-Efficient Latch With Triple-Node-Upset Recovery and Algorithm-Based Verifications","authors":"Xing Guo;Jiajia Zhang;Xu Meng;Zhenmin Li;Xiaoqing Wen;Patrick Girard;Bin Liang;Aibin Yan","doi":"10.1109/TCAD.2024.3511335","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3511335","url":null,"abstract":"With the rapid advancement of semiconductor technologies, latches become increasingly sensitive to soft errors, especially triple node upsets (TNUs), in harsh radiation environments. In this article, we first propose a high-performance and area-efficient latch, namely, HALTRAV, featuring complete TNU-recovery. The storage portion of HALTRAV consists of 28 interlocked source-drain cross-coupled inverters (SCIs) for complete TNU-recovery with area efficiency and low delay. To mitigate the issue that node-upset-recovery verifications for existing latches highly relies on electronic design automation tools, we further propose an algorithm-based verification method that can automatically verify the node-upset-recovery of latches, which greatly simplifies the reliability-verification flow. Simulation results demonstrate the TNU-recovery of HALTRAV and also show that HALTRAV achieves 40.38%, 8.17%, and 31.89% reduction in delay, area, and delay-power–area product (DPAP) on average, respectively; however; it is at the cost of power as compared to typical latches that are TNU-recoverable. Comparison results also demonstrate the moderate sensitivity of HALTRAV to the impacts of the process, voltage, and temperature (PVT) variations.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2367-2377"},"PeriodicalIF":2.7,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Recursive Partition-Based In-Memory SIMD Computation Scheduler for Memory Footprint Minimization 一个基于递归分区的内存SIMD计算调度器，用于内存占用最小化

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-12-04 DOI: 10.1109/TCAD.2024.3511337

Xingyue Qian;Chenyang Lv;Zhezhi He;Weikang Qian

{"title":"A Recursive Partition-Based In-Memory SIMD Computation Scheduler for Memory Footprint Minimization","authors":"Xingyue Qian;Chenyang Lv;Zhezhi He;Weikang Qian","doi":"10.1109/TCAD.2024.3511337","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3511337","url":null,"abstract":"In-memory computing (IMC) is a technique that enables memory to perform computation so that data transfer between processor and memory can be reduced, improving energy efficiency. A popular IMC design style is based on the single-instruction-multiple-data (SIMD) concept. The SIMD IMC can implement a high-level function by two steps: 1) synthesis and 2) scheduling. The former converts the high-level function into a netlist of the supported primitive logic operations, while the latter determines the execution sequence of the operations. To fully exploit the advantage of SIMD IMC, it is crucial to find a schedule for the given netlist with less memory usage, known as memory footprint (MF). In this work, we first propose an optimal scheduler that can minimize the MF for small netlists. It is at least <inline-formula> <tex-math>$8times $ </tex-math></inline-formula> faster than the state-of-the-art optimal method. For large netlists, we propose a recursive partition-based scheduler consisting of a scheduling-friendly bipartition algorithm and our optimal scheduler. Compared to four state-of-the-art heuristic methods, ours reduces the MF by 54.7%, 48.9%, 44.0%, and 25.5%, respectively, under the same runtime. Our experiments also demonstrate that our scheduler achieves good end-to-end performance when applied to various IMC architectures. The code of our scheduler is made open-source.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2105-2118"},"PeriodicalIF":2.7,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel Accurate Minifloat MACCs for Neural Network Inference on Versal FPGAs 通用fpga上用于神经网络推理的并行精确微型浮动mcc

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-12-04 DOI: 10.1109/TCAD.2024.3511343

Hans Jakob Damsgaard;Konstantin J. Hoßfeld;Jari Nurmi;Thomas B. Preußer

{"title":"Parallel Accurate Minifloat MACCs for Neural Network Inference on Versal FPGAs","authors":"Hans Jakob Damsgaard;Konstantin J. Hoßfeld;Jari Nurmi;Thomas B. Preußer","doi":"10.1109/TCAD.2024.3511343","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3511343","url":null,"abstract":"Machine learning (ML) is ubiquitous in contemporary applications. Its need for efficient acceleration has driven vast research efforts into the quantization of neural networks with low-precision numerical formats. Models quantized with minifloat formats of eight or fewer bits have proven capable of outperforming models quantized into same-size integers. However, unlike integers, minifloats require accurate accumulation to prevent the introduction of rounding errors. We explore the design space of parallel accurate minifloat multiply-accumulators (MACCs) targeting the AMD VersalTM FPGA fabric. We experiment with three variations of the multiply-and-shift and adder tree components of a minifloat MACC. For comparison, we apply similar alterations to a parallel integer MACC. Our results show that custom compressor trees with external sign-inversion gates reduce the mean area of the minifloat MACCs by 17.7% and increase their clock frequency by 16.2%. In comparison, custom compressor trees with absorbed partial product generation gates reduce the mean area of integer MACCs by 28.1% and increase their clock frequency by 3.60%. Comparing the best-performing designs, we observe that minifloat MACCs consume 20% to 180% more resources than integer ones with same-size operands without accounting for a conversion back into a floating-point format, and 60% to 300% more resources when including it. Our data enable engineers to make informed decisions in their designs of deeply integrated embedded ML solutions when trading off training and fine-tuning effort versus resource cost.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2181-2194"},"PeriodicalIF":2.7,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10777058","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PauliForest: Connectivity-Aware Synthesis and Pauli-Oriented Qubit Mapping for Near-Term Quantum Simulation 近期量子模拟的连接感知合成和面向pauli的量子位映射

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-12-02 DOI: 10.1109/TCAD.2024.3509794

Yongshang Li;Yu Zhang;Haoning Deng;Mingyu Chen;Zhenyu Li

{"title":"PauliForest: Connectivity-Aware Synthesis and Pauli-Oriented Qubit Mapping for Near-Term Quantum Simulation","authors":"Yongshang Li;Yu Zhang;Haoning Deng;Mingyu Chen;Zhenyu Li","doi":"10.1109/TCAD.2024.3509794","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3509794","url":null,"abstract":"Quantum simulation is the foundation for the design of many algorithms which share subroutines known as quantum simulation kernels. Optimizing the compilation of these kernels is crucial, involving two key components: 1) circuit synthesis and 2) qubit mapping. However, existing circuit synthesis methods either overlook qubit connectivity constraints (QCCs) or prioritize minimizing gate count over optimizing circuit depth. Similarly, current qubit mapping techniques do not work well with circuit synthesis methods. To address these limitations, we propose PauliForest, which comprises a connectivity-aware circuit synthesis algorithm and a Pauli-oriented qubit mapping algorithm. The synthesis algorithm employs heuristic strategies to generate shallower circuits, while the qubit mapping algorithm seamlessly collaborates with the circuit synthesis process. Compared to the state-of-the-art Paulihedral compiler, our approach significantly reduces both CNOT gate counts (by 13%) and circuit depths (by 25%). Experiments on a noisy simulator and a real superconducting quantum computer show that our algorithm can improve the fidelity of quantum circuit execution compared to Paulihedral.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2119-2129"},"PeriodicalIF":2.7,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Resubstitution-Based Approximate Logic Synthesis 基于替换的高效近似逻辑综合

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-12-02 DOI: 10.1109/TCAD.2024.3510513

Chang Meng;Alan Mishchenko;Weikang Qian;Giovanni De Micheli

引用次数: 0

PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop Prediction PDNNet：基于pdn感知的GNN-CNN异构网络动态红外下降预测

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-29 DOI: 10.1109/TCAD.2024.3509796

Yuxiang Zhao;Zhuomin Chai;Xun Jiang;Yibo Lin;Runsheng Wang;Ru Huang

{"title":"PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop Prediction","authors":"Yuxiang Zhao;Zhuomin Chai;Xun Jiang;Yibo Lin;Runsheng Wang;Ru Huang","doi":"10.1109/TCAD.2024.3509796","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3509796","url":null,"abstract":"IR drop on the power delivery network (PDN) is closely related to PDN’s configuration and cell current consumption. As the integrated circuit (IC) design is growing larger, dynamic IR drop simulation becomes computationally unaffordable and machine learning-based IR drop prediction has been explored as a promising solution. Although convolutional neural network (CNN)-based methods have been adapted to IR drop prediction task in several works, the shortcomings of overlooking PDN configuration is non-negligible. In this article, we consider not only how to properly represent cell-PDN relation, but also how to model IR drop following its physical nature in the feature aggregation procedure. Thus, we propose a novel graph structure, PDNGraph, to unify the representations of the PDN structure and the fine-grained cell-PDN relation. We further propose a dual-branch heterogeneous network, PDNNet, incorporating two parallel GNN-CNN branches to favorably capture the above features during the learning process. Several key designs are presented to make the dynamic IR drop prediction highly effective and interpretable. We are the first work to apply graph structure to deep-learning-based dynamic IR drop prediction method. Experiments show that PDNNet outperforms the state-of-the-art CNN-based methods and achieves <inline-formula> <tex-math>$545times $ </tex-math></inline-formula> speedup compared to the commercial tool, which demonstrates the superiority of our method.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2253-2263"},"PeriodicalIF":2.7,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Nonintrusive Data-Driven Model Order Reduction for Circuits Based on Hammerstein Architectures 基于Hammerstein架构的非侵入式数据驱动模型降阶电路

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-29 DOI: 10.1109/TCAD.2024.3509797

Joshua Hanson;Paul Kuberry;Biliana Paskaleva;Pavel Bochev

引用次数: 0

Hierarchical Model Checking of SystemVerilog-Specified Asynchronous Circuits for Deadlock Detection 用于死锁检测的systemverilog指定异步电路的分层模型检查

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-29 DOI: 10.1109/TCAD.2024.3509798

Longlong Lu;Minxue Pan;Yifei Lu;Xuandong Li

{"title":"Hierarchical Model Checking of SystemVerilog-Specified Asynchronous Circuits for Deadlock Detection","authors":"Longlong Lu;Minxue Pan;Yifei Lu;Xuandong Li","doi":"10.1109/TCAD.2024.3509798","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3509798","url":null,"abstract":"Specifying channel-based asynchronous circuits in SystemVerilog is a promising alternative design paradigm to combine the advantages of asynchronous circuits and industrial electronic design automation supports. However, communicating through channels can be error-prone, potentially introducing deadlocks that cannot be detected easily through simulation. In contrast, model checking can reliably identify deadlocks, but faces challenges related to scalability and modeling capability. This research proposes a novel model checking approach, named Verilock, to detect deadlocks of channel-based asynchronous circuits specified in SystemVerilog. To address the issue of modeling capability, Verilock extracts intermodule communication behavior from SystemVerilog circuit designs and builds models in communication protocols specifically designed for this purpose. Additionally, Verilock employs a novel hierarchical model checking algorithm that conducts localized verification of well-formed groups of the system from the bottom up, thus reducing the size of the checking problems and presenting the opportunity to parallelize the checking process. Extensive experimental evaluations confirm the efficiency of Verilock in publicly accessible and randomly synthesized large-scale asynchronous circuits. Remarkably, significant benefits of the hierarchical checking approach are demonstrated through an ablative experiment.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2424-2437"},"PeriodicalIF":2.7,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144100062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated Design for Multiorgan-on-Chip Geometries 芯片上多器官几何图形的自动化设计

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-29 DOI: 10.1109/TCAD.2024.3509795

Maria Emmerich;Philipp Ebner;Robert Wille

{"title":"Automated Design for Multiorgan-on-Chip Geometries","authors":"Maria Emmerich;Philipp Ebner;Robert Wille","doi":"10.1109/TCAD.2024.3509795","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3509795","url":null,"abstract":"Multiorgans-on-chips (multi-OoCs) represent human or other animal physiology on a chip—providing testing platforms for the pharmaceutical, cosmetic, and chemical industries. They are composed of miniaturized organ tissues (so-called organ modules) that are connected via a microfluidic channel network and, by this, represent organ functionalities and their interactions on-chip. The design of these multi-OoC geometries, however, requires a sophisticated orchestration of numerous aspects, such as the size of organ modules, the required shear stress on membranes and subsequently the flow rate, the dimensions and geometry of channels, pump pressures, etc. Mastering all this constitutes a nontrivial design task for which, unfortunately, no automatic support exists yet. In this work, we propose a design automation solution for multi-OoC geometries. To this end, we review the respective design steps and derive a corresponding formal design specification from them. Based on that, we then propose an automatic design tool, which generates a design of the desired device and exports it in a fashion that is ready for subsequent simulation or fabrication. The open-source tool and a step-by-step tutorial are available at <uri>https://github.com/cda-tum/mmft-ooc-designer</uri>. Evaluations (inspired by real-world use cases and confirmed by computational fluid dynamic simulations as well as a fabrication process) demonstrate the applicability and validity of the proposed approach.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2287-2299"},"PeriodicalIF":2.7,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10771959","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Energy-Aware Heterogeneous Federated Learning via Approximate DNN Accelerators 基于近似DNN加速器的能量感知异构联邦学习

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-29 DOI: 10.1109/TCAD.2024.3509793

Kilian Pfeiffer;Konstantinos Balaskas;Kostas Siozios;Jörg Henkel

{"title":"Energy-Aware Heterogeneous Federated Learning via Approximate DNN Accelerators","authors":"Kilian Pfeiffer;Konstantinos Balaskas;Kostas Siozios;Jörg Henkel","doi":"10.1109/TCAD.2024.3509793","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3509793","url":null,"abstract":"In Federated Learning (FL), devices that participate in the training usually have heterogeneous resources, i.e., energy availability. In current deployments of FL, devices that do not fulfill certain hardware requirements are often dropped from the collaborative training. However, dropping devices in FL can degrade training accuracy and introduce bias or unfairness. Several works have tackled this problem on an algorithm level, e.g., by letting constrained devices train a subset of the server neural network (NN) model. However, it has been observed that these techniques are not effective w.r.t. accuracy. Importantly, they make simplistic assumptions about devices’ resources via indirect metrics, such as multiply accumulate (MAC) operations or peak memory requirements. We observe that memory access costs (that are currently not considered in simplistic metrics) have a significant impact on the energy consumption. In this work, for the first time, we consider on-device accelerator design for FL with heterogeneous devices. We utilize compressed arithmetic formats and approximate computing, targeting to satisfy limited energy budgets. Using a hardware-aware energy model, we observe that, contrary to the state of the art’s moderate energy reduction, our technique allows for lowering the energy requirements (by <inline-formula> <tex-math>$4times $ </tex-math></inline-formula>) while maintaining higher accuracy.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 6","pages":"2054-2066"},"PeriodicalIF":2.7,"publicationDate":"2024-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144108353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0