IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献

筛选
英文 中文
Backdoor Attacks on Safe Reinforcement Learning-Enabled Cyber–Physical Systems 对安全强化学习网络物理系统的后门攻击
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447468
Shixiong Jiang;Mengyu Liu;Fanxin Kong
{"title":"Backdoor Attacks on Safe Reinforcement Learning-Enabled Cyber–Physical Systems","authors":"Shixiong Jiang;Mengyu Liu;Fanxin Kong","doi":"10.1109/TCAD.2024.3447468","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447468","url":null,"abstract":"Safe reinforcement learning (RL) aims to derive a control policy that navigates a safety-critical system while avoiding unsafe explorations and adhering to safety constraints. While safe RL has been extensively studied, its vulnerabilities during the policy training have barely been explored in an adversarial setting. This article bridges this gap and investigates the training time vulnerability of formal language-guided safe RL. Such vulnerability allows a malicious adversary to inject backdoor behavior into the learned control policy. First, we formally define backdoor attacks for safe RL and divide them into active and passive ones depending on whether to manipulate the observation. Second, we propose two novel algorithms to synthesize the two kinds of attacks, respectively. Both algorithms generate backdoor behaviors that may go unnoticed after deployment but can be triggered when specific states are reached, leading to safety violations. Finally, we conduct both theoretical analysis and extensive experiments to show the effectiveness and stealthiness of our methods.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4093-4104"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bank on Compute-Near-Memory: Design Space Exploration of Processing-Near-Bank Architectures 计算-近存银行:处理近存储架构的设计空间探索
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3442989
Rafael Medina;Giovanni Ansaloni;Marina Zapater;Alexandre Levisse;Saeideh Alinezhad Chamazcoti;Timon Evenblij;Dwaipayan Biswas;Francky Catthoor;David Atienza
{"title":"Bank on Compute-Near-Memory: Design Space Exploration of Processing-Near-Bank Architectures","authors":"Rafael Medina;Giovanni Ansaloni;Marina Zapater;Alexandre Levisse;Saeideh Alinezhad Chamazcoti;Timon Evenblij;Dwaipayan Biswas;Francky Catthoor;David Atienza","doi":"10.1109/TCAD.2024.3442989","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3442989","url":null,"abstract":"Near-DRAM computing strategies advocate for providing computational capabilities close to where data is stored. Although this paradigm can effectively address the memory-to-processor communication bottleneck, it also presents new challenges: The strict resource constraints in the memory periphery demand careful tailoring of architectural elements. We herein propose a novel framework and methodology to explore compute-near-memory designs that interface to DRAM memory banks, demonstrating the area, energy, and performance tradeoffs subject to the architectural configuration. We exemplify this methodology by conducting two studies on compute-near-bank designs: 1) analyzing the interaction between control and data resources, and 2) exploring the integration of processing units with different DRAM standards. According to our study, the optimal size ratios between instruction and data capacity vary from \u0000<inline-formula> <tex-math>$2times $ </tex-math></inline-formula>\u0000 to \u0000<inline-formula> <tex-math>$4times $ </tex-math></inline-formula>\u0000 across benchmarks from representative application domains. The retrieved Pareto-optimal solutions from our framework improve state-of-the-art designs, e.g., achieving a 50% performance increase on matrix operations with 15% energy overhead relative to the FIMDRAM design. In addition, the exploration of DRAM shows the interplay between available internal bandwidth, performance, and area overhead. For example, a threefold increase in bandwidth rises performance by 47% across workloads at a 34% extra area cost.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4117-4129"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios FlexFL:在不确定场景中通过 APoZ 引导的灵活剪枝进行异构联合学习
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3444695
Zekai Chen;Chentao Jia;Ming Hu;Xiaofei Xie;Anran Li;Mingsong Chen
{"title":"FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios","authors":"Zekai Chen;Chentao Jia;Ming Hu;Xiaofei Xie;Anran Li;Mingsong Chen","doi":"10.1109/TCAD.2024.3444695","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3444695","url":null,"abstract":"Along with the increasing popularity of deep learning (DL) techniques, more and more Artificial Intelligence of Things (AIoT) systems are adopting federated learning (FL) to enable privacy-aware collaborative learning among the AIoT devices. However, due to the inherent data and device heterogeneity issues, the existing FL-based AIoT systems suffer from the model selection problem. Although various heterogeneous FL methods have been investigated to enable collaborative training among the heterogeneous models, there is still a lack of 1) wise heterogeneous model generation methods for the devices; 2) consideration of uncertain factors; and 3) performance guarantee for the large models, thus strongly limiting the overall FL performance. To address the above issues, this article introduces a novel heterogeneous FL framework named FlexFL. By adopting our average percentage of zeros (APoZ)-guided flexible pruning strategy, FlexFL can effectively derive best-fit models for the heterogeneous devices to explore their greatest potential. Meanwhile, our proposed adaptive local pruning strategy allows the AIoT devices to prune their received models according to their varying resources within uncertain scenarios. Moreover, based on the self-knowledge distillation, FlexFL can enhance the inference performance of the large models by learning the knowledge from the small models. Comprehensive experimental results show that, compared to the state-of-the-art heterogeneous FL methods, FlexFL can significantly improve the overall inference accuracy by up to 14.24%. Our code can be found here \u0000<uri>https://github.com/mastlab-T3S/FlexFL</uri>\u0000.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4069-4080"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers Deeploy:在异构微控制器上实现小型语言模型的高能效部署
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443718
Moritz Scherer;Luka Macan;Victor J. B. Jung;Philip Wiese;Luca Bompani;Alessio Burrello;Francesco Conti;Luca Benini
{"title":"Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers","authors":"Moritz Scherer;Luka Macan;Victor J. B. Jung;Philip Wiese;Luca Bompani;Alessio Burrello;Francesco Conti;Luca Benini","doi":"10.1109/TCAD.2024.3443718","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443718","url":null,"abstract":"With the rise of embodied foundation models (EFMs), most notably small language models (SLMs), adapting Transformers for the edge applications has become a very active field of research. However, achieving the end-to-end deployment of SLMs on the microcontroller (MCU)-class chips without high-bandwidth off-chip main memory access is still an open challenge. In this article, we demonstrate high efficiency end-to-end SLM deployment on a multicore RISC-V (RV32) MCU augmented with ML instruction extensions and a hardware neural processing unit (NPU). To automate the exploration of the constrained, multidimensional memory versus computation tradeoffs involved in the aggressive SLM deployment on the heterogeneous (multicore+NPU) resources, we introduce Deeploy, a novel deep neural network (DNN) compiler, which generates highly optimized C code requiring minimal runtime support. We demonstrate that Deeploy generates the end-to-end code for executing SLMs, fully exploiting the RV32 cores’ instruction extensions and the NPU. We achieve leading-edge energy and throughput of \u0000<inline-formula> <tex-math>$490 ; mu $ </tex-math></inline-formula>\u0000J per token, at 340 token per second for an SLM trained on the TinyStories dataset, running for the first time on an MCU-class device without the external memory.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"4009-4020"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Prevention of MCAS-Induced Crashes 分析和预防 MCAS 引发的碰撞事故
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3438105
Noah T. Curran;Thomas W. Kennings;Kang G. Shin
{"title":"Analysis and Prevention of MCAS-Induced Crashes","authors":"Noah T. Curran;Thomas W. Kennings;Kang G. Shin","doi":"10.1109/TCAD.2024.3438105","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438105","url":null,"abstract":"Semi-autonomous (SA) systems face the C\u0000<sc>hallenge</small>\u0000 of determining which source to prioritize for control, whether it is from the human operator or the autonomous controller, especially when they conflict with each other. While one may design an SA system to default to accepting control from one or the other, such design choices can have catastrophic consequences in safety-critical settings. For instance, the sensors an autonomous controller relies upon may provide incorrect information about the environment due to tampering or natural fault. On the other hand, the human operator may also provide erroneous input. To better understand the consequences and resolution of this safety-critical design choice, we investigate a specific application of an SA system that failed due to a static assignment of control authority: the well-publicized Boeing 737-MAX maneuvering characteristics augmentation system (MCAS) that caused the crashes of Lion Air Flight 610 and Ethiopian Airlines Flight 302. First, using a representative simulation, we analyze and demonstrate the ease by which the original MCAS design could fail. Our analysis reveals the most robust public analysis of aircraft recoverability under MCAS faults, offering bounds for those scenarios beyond the original crashes. We also analyze Boeing’s updated MCAS and show how it falls short of its intended goals and continues to rely upon on a fault-prone static assignment of control priority. Using these insights, we present SA-MCAS, a new MCAS that both meets the intended goals of MCAS and avoids the failure cases that plague both MCAS designs. We demonstrate SA-MCAS’s ability to make safer and timely control decisions of the aircraft, even when the human and autonomous operators provide conflicting control inputs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3382-3394"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HLS-Based Approach for Embedded Real-Time Ray Tracing in Wireless Communications 基于 HLS 的无线通信嵌入式实时光线跟踪方法
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3446710
Jintong An;Selma Saidi
{"title":"HLS-Based Approach for Embedded Real-Time Ray Tracing in Wireless Communications","authors":"Jintong An;Selma Saidi","doi":"10.1109/TCAD.2024.3446710","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3446710","url":null,"abstract":"With the development of wireless communication technology, complex and dynamic scenarios pose great challenges to the Quality of Service (QoS) of wireless communication, especially in indoor scenarios. The quality of beam management can be greatly improved if signal ray-tracing module is embedded in wireless devices to handle synthetic multipath transmissions in real time. In this article, a novel reflection path derivation algorithm for ray tracing of signal beams is proposed, which builds the core mechanism of the proposed FPGA accelerator for ray tracing: by decomposing the computation of the entire ray path into mutually independent subproblems associated with the respective planes involved in the reflection and implemented by independent processing element on FPGAs, the parallelization of the entire ray tracing is realized, which significantly improves the convergence speed of the ray tracing; meanwhile, a new high-level synthesis workflow corresponds to the proposed algorithm and hardware architecture is proposed, which opens the door on synthesizing embedded hardware dedicated for robust and real-time wireless communication. After validation, the method proposed in this article can generate FPGA accelerator for real-time ray-tracing effectively, which achieves ray-tracing simulation in milliseconds.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3720-3731"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MII: A Multifaceted Framework for Intermittence-Aware Inference and Scheduling MII:间歇感知推理和调度的多元框架
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3443710
Ziliang Zhang;Cong Liu;Hyoseung Kim
{"title":"MII: A Multifaceted Framework for Intermittence-Aware Inference and Scheduling","authors":"Ziliang Zhang;Cong Liu;Hyoseung Kim","doi":"10.1109/TCAD.2024.3443710","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3443710","url":null,"abstract":"The concurrent execution of deep neural networks (DNNs) inference tasks on the intermittently-powered batteryless devices (IPDs) has recently garnered much attention due to its potential in a broad range of smart sensing applications. While the checkpointing mechanisms (CMs) provided by the state-of-the-art make this possible, scheduling inference tasks on IPDs is still a complex problem due to significant performance variations across the DNN layers and CM choices. This complexity is further accentuated by dynamic environmental conditions and inherent resource constraints of IPDs. To tackle these challenges, we present MII, a framework designed for the intermittence-aware inference and scheduling on IPDs. MII formulates the shutdown and live time functions of an IPD from profiling the data, which our offline intermittence-aware search scheme uses to find the optimal layer-wise CMs for each task. At runtime, MII enhances the job success rates by dynamically making scheduling decisions to mitigate the workload losses from the power interruptions and adjusting these CMs in response to the actual energy patterns. Our evaluation demonstrates the superiority of MII over the state-of-the-art. In controlled environments, MII achieves an average increase of 21% and 39% in successful jobs under the stable and dynamic energy patterns. In the real-world settings, MII achieves 33% and 24% more successful jobs indoors and outdoors.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3708-3719"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HMC-FHE: A Heterogeneous Near Data Processing Framework for Homomorphic Encryption HMC-FHE:同态加密的异构近数据处理框架
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447212
Zehao Chen;Zhining Cao;Zhaoyan Shen;Lei Ju
{"title":"HMC-FHE: A Heterogeneous Near Data Processing Framework for Homomorphic Encryption","authors":"Zehao Chen;Zhining Cao;Zhaoyan Shen;Lei Ju","doi":"10.1109/TCAD.2024.3447212","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447212","url":null,"abstract":"Fully homomorphic encryption (FHE) offers a promising solution to ensure data privacy by enabling computations directly on encrypted data. However, its notorious performance degradation severely limits the practical application, due to the explosion of both the ciphertext volume and computation. In this article, leveraging the diversity of computing power and memory bandwidth requirements of FHE operations, we present HMC-FHE, a robust acceleration framework that combines both GPU and hybrid memory cube (HMC) processing engines to accelerate FHE applications cooperatively. HMC-FHE incorporates four key hardware/software co-design techniques: 1) a fine-grained kernel offloading mechanism to efficiently offload FHE operations to relevant processing engines; 2) a ciphertext partitioning scheme to minimize data transfer across decentralized HMC processing engines; 3) an FHE operation pipeline scheme to facilitate pipelined execution between GPU and HMC engines; and 4) a kernel tuning scheme to guarantee the parallelism of GPU and HMC engines. We demonstrate that the GPU-HMC architecture with proper resource management serves as a promising acceleration scheme for memory-intensive FHE operations. Compared with the state-of-the-art GPU-based acceleration scheme, the proposed framework achieves up to \u0000<inline-formula> <tex-math>$2.65times $ </tex-math></inline-formula>\u0000 performance gains and reduces \u0000<inline-formula> <tex-math>$1.81times $ </tex-math></inline-formula>\u0000 energy consumption with the same peak computation capacity.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3551-3563"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Polynomial Neural Barrier Certificate Synthesis of Hybrid Systems via Counterexample Guidance 通过反例指导合成混合系统的多项式神经障碍证书
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3447226
Hanrui Zhao;Banglong Liu;Lydia Dehbi;Huijiao Xie;Zhengfeng Yang;Haifeng Qian
{"title":"Polynomial Neural Barrier Certificate Synthesis of Hybrid Systems via Counterexample Guidance","authors":"Hanrui Zhao;Banglong Liu;Lydia Dehbi;Huijiao Xie;Zhengfeng Yang;Haifeng Qian","doi":"10.1109/TCAD.2024.3447226","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3447226","url":null,"abstract":"This article presents a novel approach to the safety verification of hybrid systems by synthesizing neural barrier certificates (BCs) via counterexample-guided neural network (NN) learning combined with sum-of-square (SOS)-based verification. We learn more easily verifiable BCs with NN polynomial expansions in a high-accuracy counterexamples guided framework. By leveraging the polynomial candidates yielded from the learning phase, we reformulate the identification of real BCs as convex linear matrix inequality (LMI) feasibility testing problems, instead of directly solving the inherently NP-hard nonconvex bilinear matrix inequality (BMI) problems associated with SOS-based BC generation. Furthermore, we decompose the large SOS verification programming into several manageable subprogrammings. Benefiting from the efficiency and scalability advantages, our approach can synthesize BCs not amenable to existing methods and handle more general hybrid systems.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3756-3767"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ML-Based Thermal and Cache Contention Alleviation on Clustered Manycores With 3-D HBM 利用 3-D HBM 在集群 Manycores 上实现基于 ML 的散热和缓存争用缓解
IF 2.7 3区 计算机科学
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2024-11-06 DOI: 10.1109/TCAD.2024.3438998
Mohammed Bakr Sikal;Heba Khdr;Lokesh Siddhu;Jörg Henkel
{"title":"ML-Based Thermal and Cache Contention Alleviation on Clustered Manycores With 3-D HBM","authors":"Mohammed Bakr Sikal;Heba Khdr;Lokesh Siddhu;Jörg Henkel","doi":"10.1109/TCAD.2024.3438998","DOIUrl":"https://doi.org/10.1109/TCAD.2024.3438998","url":null,"abstract":"Enabled by the recent advancements in 2.5D/3-D integration and packaging, the integration of clustered manycore processors with high-bandwidth memory (HBM) is gaining prominence to satisfy the increasing memory bandwidth demands. Although this integration can offer significant performance gains, it is still limited by cache contention in the final-level cache on the clusters and by the thermal issues in the 3-D HBM. While the existing state-of-the-art resource management techniques have tackled these issues in isolation, we argue that the cache contention and the temperature of both the manycore and the HBM must be considered jointly to harness the full performance potential of such modern architectures. To cover this gap in the literature, we present MTCM, the first resource management technique that considers the cache contention in maximizing the system performance, while maintaining the thermal safety across both the manycore and the HBM stack. Enabled by our accurate, yet lightweight, neural network models, our proposed task migration and dynamic voltage and frequency scaling policies can accurately predict the impact of runtime decisions on the performance and temperature of both the subsystems. Our extensive evaluation experiments reveal a significant performance improvement over existing state of the art by up to \u0000<inline-formula> <tex-math>$1times $ </tex-math></inline-formula>\u0000, while maintaining thermal safety of both the manycore and the HBM.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3614-3625"},"PeriodicalIF":2.7,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信