IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

筛选
英文 中文
RaPC: Raw Bit Error Rate Aware Polar Coding for 3-D nand Flash Memory 3-D闪存的原始误码率感知极性编码
IF 2.9 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-10 DOI: 10.1109/TCAD.2025.3540375
Ruifeng Tu;Meng Zhang;Changsheng Xie;Fei Wu
{"title":"RaPC: Raw Bit Error Rate Aware Polar Coding for 3-D nand Flash Memory","authors":"Ruifeng Tu;Meng Zhang;Changsheng Xie;Fei Wu","doi":"10.1109/TCAD.2025.3540375","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3540375","url":null,"abstract":"Reliability challenges like random telegraph noise (RTN) and intercell electrostatic interference have gotten worse as feature sizes in planar<sc>nand</small> flash memory continue to reduce. In order to improve storage capacity, 3-D stacking of<sc>nand</small> flash memory has emerged as the preferred development path. However, additional challenges are brought about by the switch to 3-D<sc>nand</small> flash, such as shorter lifespans and lower reliability as a result of higher integration densities and intricate vertical interference. This article proposes RaPC: a raw bit error rate (RBER) aware polar coding scheme for improving data reliability of 3-D<sc>nand</small> flash memory. According to the variation of the RBER, the error correction ability of the polar code is dynamically adjusted to correct bit errors, which ensures the reliability and reduces the decoding delay. Simulation results demonstrate that RaPC offers significant advantages in decoding latency and performance over conventional low-density parity-check (LDPC) codes within specific RBER ranges, making it a promising solution for enhancing the reliability of 3-D<sc>nand</small> flash memory.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3546-3559"},"PeriodicalIF":2.9,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temperature Effects of Program Operation in 3-D nand Flash Memory: Observations, Analysis, and Solutions 3-D闪存中程序操作的温度效应:观察、分析和解决方案
IF 2.9 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-07 DOI: 10.1109/TCAD.2025.3539982
Hua Feng;Debao Wei;Qi Wang;Yongchao Wang;Liyan Qiao;Zongliang Huo
{"title":"Temperature Effects of Program Operation in 3-D nand Flash Memory: Observations, Analysis, and Solutions","authors":"Hua Feng;Debao Wei;Qi Wang;Yongchao Wang;Liyan Qiao;Zongliang Huo","doi":"10.1109/TCAD.2025.3539982","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539982","url":null,"abstract":"As flash memory storage density continues to increase, it has become the mainstream storage medium for electronic devices. Writing data in low-temperature environment causes distortions in the flash memory threshold voltage distribution (TVD), which spikes the raw bit error rate and ultimately leads to degradation of the performance of flash-based electronic devices. To ameliorate the reliability problem caused by flash memory read and program temperature variations, this study proposes a flash memory programming temperature compensation algorithm based on read reference voltage (PTC-RRV) calibration. 3-D triple-level cell (TLC) flash memory is currently the mainstream storage medium for consumer electronics. Based on a large number of real tests on this type of chips, the relationship between the programming/reading temperature and the TVD of flash memory is fully characterized, and a programming temperature compensation model is constructed. The model evaluation results show that the PTC-RRV strategy can significantly reduce the average number of read-retry of low temperature written data and effectively improve the storage reliability and read performance of flash memory, whose optimization effect on electronic devices is better than the existing temperature compensation algorithms.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3313-3322"},"PeriodicalIF":2.9,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Re-CATA: Real-Time and Flexible Accelerator Design Framework for On-Device Codec Avatars Re-CATA:设备上编解码器头像的实时灵活加速器设计框架
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-06 DOI: 10.1109/TCAD.2025.3539600
Yongan Zhang;Yuecheng Li;Syed Shakib Sarwar;H. Ekin Sumbul;Yonggan Fu;Haoran You;Cheng Wan;Yingyan Lin
{"title":"Re-CATA: Real-Time and Flexible Accelerator Design Framework for On-Device Codec Avatars","authors":"Yongan Zhang;Yuecheng Li;Syed Shakib Sarwar;H. Ekin Sumbul;Yonggan Fu;Haoran You;Cheng Wan;Yingyan Lin","doi":"10.1109/TCAD.2025.3539600","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539600","url":null,"abstract":"Real-time Codec Avatars, which employ deep generative models for 3-D reconstruction of human features, are crucial for immersive telepresence in augmented reality and virtual reality (AR/VR) environments. However, deploying these avatars in real-time on AR/VR headsets is challenging due to the inability of existing devices to achieve satisfying performance within stringent hardware resource constraints. To address these challenges, we introduce Re-CATA, an innovative full-stack and flexible Codec Avatar accelerator design framework. Re-CATA is designed to deliver real-time throughput (greater than 120 FPS) for the complete Codec Avatar processing pipeline within an edge-level power budget of 5 W under FPGA prototyping. Our approach begins by abstracting the operation mapping and scheduling challenges inherent in Codec Avatars, which require both centralized and distributed processing to handle dynamically changing workloads. We propose a novel hardware resource and workload partitioning scheme optimized for these fluctuating demands. To complement this, we introduce an agile runtime scheduling system for efficient workload reallocation among computing units as needed, recognizing the limitations of static partitioning in rapidly evolving workload scenarios. Furthermore, our micro-architecture design incorporates unified computing modules and efficient hardware peripherals, enabling seamless workload balancing across the Codec Avatar processing pipeline. We evaluate the Re-CATA accelerators via on-board FPGA prototyping, comparing them to various baselines, including commercial AR/VR system-on-chips and academic accelerators. This evaluation demonstrates a maximum speedup of up to <inline-formula> <tex-math>$5.95times $ </tex-math></inline-formula> under similar settings.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3020-3033"},"PeriodicalIF":2.7,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Single-Pass: An Operation Unit-Based In-Memory Computing Architecture for Sparse Neural Networks 单遍:一种基于运算单元的稀疏神经网络内存计算架构
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-06 DOI: 10.1109/TCAD.2025.3539592
Shang Wang;Qi Cao;Yongqiang Wang;Hang Chen;Zhenjiao Chen;Feng Liang
{"title":"Single-Pass: An Operation Unit-Based In-Memory Computing Architecture for Sparse Neural Networks","authors":"Shang Wang;Qi Cao;Yongqiang Wang;Hang Chen;Zhenjiao Chen;Feng Liang","doi":"10.1109/TCAD.2025.3539592","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539592","url":null,"abstract":"Compute-in-memory (CIM) has emerged as a prominent research focus in recent years, offering a promising alternative for advancing traditional von Neumann architecture computers. However, the extensive array structures and peripheral circuits inherent in CIM introduce challenges related to latency and power consumption. The operation unit (OU) has gained attention as a practical solution to these issues, significantly transforming the computational paradigm of in-memory computing. Despite its potential, the possibilities enabled by this approach remain underexplored. This article presents a novel architecture, single-pass, designed around OU implementation with a new OU partitioning method optimized for sparse networks. Additionally, we propose a matrix compression technique leveraging a dual heuristic greedy algorithm (DHGA), forming the foundation of our architecture-specific mapping strategy. Experimental results demonstrate that, within given area constraints, our architecture achieves an average energy efficiency improvement of 29.8% and a speedup of 82.3% across various networks compared to the baseline.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2952-2965"},"PeriodicalIF":2.7,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shallow Quantum Circuit Implementation of Symmetric Functions With Limited Ancillary Qubits 具有有限辅助量子位的对称函数的浅量子电路实现
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-05 DOI: 10.1109/TCAD.2025.3539002
Wei Zi;Junhong Nie;Xiaoming Sun
{"title":"Shallow Quantum Circuit Implementation of Symmetric Functions With Limited Ancillary Qubits","authors":"Wei Zi;Junhong Nie;Xiaoming Sun","doi":"10.1109/TCAD.2025.3539002","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539002","url":null,"abstract":"Optimizing the depth and number of ancillary qubits in quantum circuits is crucial in quantum computation, given the limitations imposed by current quantum devices. In this article, we introduce an innovative approach for implementing arbitrary symmetric Boolean functions using poly-logarithmic depth quantum circuits with only a logarithmic number of ancillary qubits. Symmetric functions are those whose outputs are dictated solely by the Hamming weight of the inputs. These functions find applications across various domains, including quantum machine learning and arithmetic circuit synthesis. Moreover, by fully leveraging the potential of qutrits, the ancilla count can be further reduced to just one. The key technique involves a novel poly-logarithmic depth quantum circuit designed to compute Hamming weight without the need for ancillary qubits. This quantum circuit for Hamming weight is of independent interest due to its wide-ranging applications, such as in quantum memory, quantum machine learning, and Hamiltonian dynamics simulations.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3060-3072"},"PeriodicalIF":2.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657448","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Device Design Guidelines to Boost Up AC Performance of CFET (Complementary Field-Effect-Transistor)-Based Inverter 提高基于互补场效应晶体管的逆变器交流性能的器件设计指南
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-05 DOI: 10.1109/TCAD.2025.3539599
Jaehyuk Lim;Donghwan Han;Juho Sung;Seokchan Yoon;Sanghyun Kang;Gwon Kim;Hyoung Won Baac;Changhwan Shin
{"title":"Device Design Guidelines to Boost Up AC Performance of CFET (Complementary Field-Effect-Transistor)-Based Inverter","authors":"Jaehyuk Lim;Donghwan Han;Juho Sung;Seokchan Yoon;Sanghyun Kang;Gwon Kim;Hyoung Won Baac;Changhwan Shin","doi":"10.1109/TCAD.2025.3539599","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3539599","url":null,"abstract":"Complementary field-effect transistors (CFETs) have emerged as promising candidates for next-generation semiconductor devices. CFETs feature a structure with an nMOS (or pMOS) transistor at the bottom and a transistor of the opposite type at the top. CFETs can be classified into Fin-CFETs or GAA-CFETs based on their channel structure. In this study, we compare and analyze these two devices to determine which structure is more favorable for device scaling and which device exhibits better performance per unit area. For a reliable analysis, the threshold voltage was adjusted to be the same for all devices. Initially, to compare the DC performance, the on-state drive currents in both linear mode and saturation mode operations were extracted and compared from the <inline-formula> <tex-math>$I_{mathrm { DS}}$ </tex-math></inline-formula>-versus-<inline-formula> <tex-math>$V_{mathrm { GS}}$ </tex-math></inline-formula> input-transfer characteristics. Subsequently, complementary metal-oxide-semiconductor inverters were constructed to compare their AC performance. Six parameters were extracted and compared: high-to-low propagation delay (<inline-formula> <tex-math>$t_{pLH}$ </tex-math></inline-formula>), falling time (<inline-formula> <tex-math>$t_{f}$ </tex-math></inline-formula>), low-to-high propagation delay (<inline-formula> <tex-math>$t_{pLH}$ </tex-math></inline-formula>), rising time (<inline-formula> <tex-math>$t_{r}$ </tex-math></inline-formula>), overshoot voltage (<inline-formula> <tex-math>$V_{ov}$ </tex-math></inline-formula>), and undershoot voltage (<inline-formula> <tex-math>$V_{und}$ </tex-math></inline-formula>). Based on the results, we suggest which CFET structure is more suitable for device scaling.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3189-3196"},"PeriodicalIF":2.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RoboSpike: Fully Utilizing the Heterogeneous System With Subcallback Scheduling in ROS 2 RoboSpike:在ROS中充分利用异构系统的子回调调度2
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-04 DOI: 10.1109/TCAD.2025.3538615
Hongyi Li;Qingyuan Yang;Songchen Ma;Rong Zhao;Xinglong Ji
{"title":"RoboSpike: Fully Utilizing the Heterogeneous System With Subcallback Scheduling in ROS 2","authors":"Hongyi Li;Qingyuan Yang;Songchen Ma;Rong Zhao;Xinglong Ji","doi":"10.1109/TCAD.2025.3538615","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3538615","url":null,"abstract":"The advancement in artificial intelligence (AI) has greatly propelled the development of robotics, requiring the adoption of heterogeneous computing architectures with multicore CPUs, GPUs, and accelerators to meet the growing computational needs of edge computing. Such heterogeneity, coupled with the inherently IO-intensive nature of robotic applications, poses substantial challenges for task scheduling and resource management. These challenges are particularly acute for systems striving to maximize computational resource utilization, which cannot be effectively addressed through callback-level scheduling. To overcome these obstacles, we developed RoboSpike, a systematic solution built on the Robot Operating System 2 (ROS 2). We first implemented a subcallback scheduling mechanism utilizing coroutines to utilize the blocked CPUs which wait for I/O operations. Building on this mechanism, we extended the design to incorporate the coprocessor and introduced an auto-tuning algorithm to adapt to system performance variations. Finally, we performed the response time analysis to ensure that the RoboSpike is predictable in time. The evaluation results demonstrate that RoboSpike achieves substantial improvements, increasing throughput by 1.65–2.25 times in real-world scenarios. RoboSpike enhances the scheduling capabilities of ROS 2 by refining the granularity from the callback level, thus opening up new opportunities for performance improvement in robotic systems, especially in resource-limited scenarios with complex workloads.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2897-2910"},"PeriodicalIF":2.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coexisting Hyperchaos in a Memristive Neuromorphic Oscillator 记忆性神经形态振荡器中的共存超混沌
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-04 DOI: 10.1109/TCAD.2025.3538692
Xin Zhang;Chunbiao Li;Tengfei Lei;Herbert Ho-Ching Iu;Tomasz Kapitaniak
{"title":"Coexisting Hyperchaos in a Memristive Neuromorphic Oscillator","authors":"Xin Zhang;Chunbiao Li;Tengfei Lei;Herbert Ho-Ching Iu;Tomasz Kapitaniak","doi":"10.1109/TCAD.2025.3538692","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3538692","url":null,"abstract":"Memristors have been widely integrated into neurons as the bridge for introducing external magnetic induction currents. The complex oscillation induced by the external magnetic stimulation is a hot topic in neuron dynamics. When a memristor is introduced into the Hindmarsh-Rose (HR) neuron to simulate the external magnetic field, a novel memristive neuromorphic hyperchaotic oscillator is constructed. The memristor weight can trigger complex neuronal firing dynamics, including the rare hyperchaotic bursting. Furthermore, when the technology of offset boosting-oriented attractor doubling is employed, a double-scroll hyperchaotic attractor can be generated, which could split into three independent coexisting attractors under some specific offsets. More interesting, two symmetric periodic attractors and two symmetric hyperchaotic attractors can coexist under certain conditions. In this work, a neuron with coexisting hyperchaotic attractors is constructed and exhaustively explored, which provides a good candidate for constituting large-scale brain-like neuromorphic oscillator. A PCB-based hardware circuit produces the oscillations validating the numerical simulations and theoretical analyses.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3179-3188"},"PeriodicalIF":2.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Transistor Count in CMOS Logic Design Through Clustering and Library-Independent Multiple-Output Logic Synthesis 通过聚类和库无关的多输出逻辑合成减少CMOS逻辑设计中的晶体管数量
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-04 DOI: 10.1109/TCAD.2025.3538492
Anup Kumar Biswas;Dimitri Kagaris
{"title":"Reducing Transistor Count in CMOS Logic Design Through Clustering and Library-Independent Multiple-Output Logic Synthesis","authors":"Anup Kumar Biswas;Dimitri Kagaris","doi":"10.1109/TCAD.2025.3538492","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3538492","url":null,"abstract":"We propose a novel transistor-level synthesis method to minimize the number of transistors needed to implement a digital circuit. In contrast with traditional standard cell design methods or transistor-level synthesis methods based on single-input “complex” gates or “super” gates, our method considers multioutput clusters as the basic resynthesis unit. Our tool takes any gate-level circuit netlist as input and divides it into several clusters of user-controlled size. For each output of a cluster, a simplified sum of product (SOP) expression is obtained and all such expressions are jointly minimized for the cluster using the MOTO-X multioutput transistor-level synthesis tool. Then, we consider groups of clusters, referred to as “superclusters,” to collectively reduce the overall transistor count. Experimental results indicate average transistor count reductions compared to the ABC synthesis tool of 9.95%, 6.53%, 10.49%, 13.09%, and 9.76% for the ISCAS’85, LGSynth’89, LGSynth’91, EPFL’15 and ITC’99 benchmark suites, respectively. Furthermore, our proposed approach proves to be more efficient than the transistor-mapped binary decision diagram approach, highlighting the potential of our methodology for optimizing integrated circuits at the transistor-level while delivering enhancements in power efficiency and demonstrating varied improvements in delay performance.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3046-3059"},"PeriodicalIF":2.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DynMap: A Heuristic Dynamic Mapper for CGRA Multitask Dynamic Resource Allocation 动态映射:CGRA多任务动态资源分配的启发式动态映射器
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-01-31 DOI: 10.1109/TCAD.2025.3537975
Yufei Yang;Chenhao Xie;Liansheng Liu;Xiyuan Peng
{"title":"DynMap: A Heuristic Dynamic Mapper for CGRA Multitask Dynamic Resource Allocation","authors":"Yufei Yang;Chenhao Xie;Liansheng Liu;Xiyuan Peng","doi":"10.1109/TCAD.2025.3537975","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3537975","url":null,"abstract":"Coarse-grained reconfigurable architecture (CGRA) has received increasing attention in both industry and academia due to its comprehensive advantages of performance, energy efficiency, and flexibility. To improve the resource utilization and handle the mixing workloads in the real-world, multiple tasks sharing the whole CGRA has became an important technical trend, and the varying resource requirements throughout their life cycles also makes run-time dynamic resource allocation (DRA) necessary for higher-multitask throughput. As the key stage of DRA, dynamic mapping (DM) is responsible for mapping kernels within each task to the dynamically allocated CGRA resources. However, existing DM methods have difficulty to balance the mapping time and the mapping quality, resulting in a significant gap between the actual and the optimal task throughput. To address the challenge, we propose DynMap, a heuristic dynamic mapper for CGRA multitask DRA. With the support of specialized scheduling and routing schemes, DynMap heuristically references the placement tendency in the static mapping result to dramatically save the mapping time, while maintaining the high-mapping quality by minimizing the possibility of resource conflicts. Experimental evaluation demonstrates DynMap not only achieves the average 1.17 ms mapping time and average 98.33% of the optimal mapping quality on different CGRA architectures, but also reaches average 98.85% of the optimal task throughput expected by different CGRA multitask DRA scenarios, reducing the gap between actual and optimal task throughput average <inline-formula> <tex-math>$31.75times $ </tex-math></inline-formula> smaller than that of the current methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"2979-2991"},"PeriodicalIF":2.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144657381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信