三维多处理器片上系统协同设计中动态热管理策略的评估框架

IF 5.6 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Darong Huang;Luis Costero;David Atienza
{"title":"三维多处理器片上系统协同设计中动态热管理策略的评估框架","authors":"Darong Huang;Luis Costero;David Atienza","doi":"10.1109/TPDS.2024.3459414","DOIUrl":null,"url":null,"abstract":"Dynamic thermal management (DTM) has been widely adopted to improve the energy efficiency, reliability, and performance of modern Multi-Processor SoCs (MPSoCs). However, the evolving industry trends and heterogeneous architecture designs have introduced significant challenges in state-of-the-art DTM methods. Specifically, the emergence of heterogeneous design has led to increased localized and non-uniform hotspots, necessitating accurate and responsive DTM strategies. Additionally, the increased number of cores to be managed requires the DTM to optimize and coordinate the whole system. However, existing methodologies fail in both precise thermal modeling in localized hotspots and fast architecture simulation. To tackle these existing challenges, we first introduce the latest version of 3D-ICE 3.1, with a novel non-uniform thermal modeling technique to support customized discretization levels of thermal grids. 3D-ICE 3.1 improves the accuracy of thermal analysis and reduces simulation overhead. Then, in conjunction with an efficient and fast offline application profiling strategy utilizing the architecture simulator gem5-X, we propose a novel DTM evaluation framework. This framework enables us to explore novel DTM methods to optimize the energy efficiency, reliability, and performance of contemporary 3D MPSoCs. The experimental results demonstrate that 3D-ICE 3.1 achieves high accuracy, with only 0.3K mean temperature error. Subsequently, we evaluate various DTM methods and propose a Multi-Agent Reinforcement Learning (MARL) control to address the demanding thermal challenges of 3D MPSoCs. Our experimental results show that the proposed DTM method based on MARL can reduce power consumption by 13% while maintaining a similar performance level to the comparison methods.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 11","pages":"2161-2176"},"PeriodicalIF":5.6000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Evaluation Framework for Dynamic Thermal Management Strategies in 3D MultiProcessor System-on-Chip Co-Design\",\"authors\":\"Darong Huang;Luis Costero;David Atienza\",\"doi\":\"10.1109/TPDS.2024.3459414\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dynamic thermal management (DTM) has been widely adopted to improve the energy efficiency, reliability, and performance of modern Multi-Processor SoCs (MPSoCs). However, the evolving industry trends and heterogeneous architecture designs have introduced significant challenges in state-of-the-art DTM methods. Specifically, the emergence of heterogeneous design has led to increased localized and non-uniform hotspots, necessitating accurate and responsive DTM strategies. Additionally, the increased number of cores to be managed requires the DTM to optimize and coordinate the whole system. However, existing methodologies fail in both precise thermal modeling in localized hotspots and fast architecture simulation. To tackle these existing challenges, we first introduce the latest version of 3D-ICE 3.1, with a novel non-uniform thermal modeling technique to support customized discretization levels of thermal grids. 3D-ICE 3.1 improves the accuracy of thermal analysis and reduces simulation overhead. Then, in conjunction with an efficient and fast offline application profiling strategy utilizing the architecture simulator gem5-X, we propose a novel DTM evaluation framework. This framework enables us to explore novel DTM methods to optimize the energy efficiency, reliability, and performance of contemporary 3D MPSoCs. The experimental results demonstrate that 3D-ICE 3.1 achieves high accuracy, with only 0.3K mean temperature error. Subsequently, we evaluate various DTM methods and propose a Multi-Agent Reinforcement Learning (MARL) control to address the demanding thermal challenges of 3D MPSoCs. Our experimental results show that the proposed DTM method based on MARL can reduce power consumption by 13% while maintaining a similar performance level to the comparison methods.\",\"PeriodicalId\":13257,\"journal\":{\"name\":\"IEEE Transactions on Parallel and Distributed Systems\",\"volume\":\"35 11\",\"pages\":\"2161-2176\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Parallel and Distributed Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10678921/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Parallel and Distributed Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10678921/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

动态热管理(DTM)已被广泛采用,以提高现代多处理器系统级芯片(MPSoC)的能效、可靠性和性能。然而,不断发展的行业趋势和异构架构设计给最先进的 DTM 方法带来了重大挑战。具体来说,异构设计的出现导致局部和非均匀热点的增加,这就要求采用准确、灵敏的 DTM 策略。此外,需要管理的内核数量增加,要求 DTM 对整个系统进行优化和协调。然而,现有方法在局部热点的精确热建模和快速架构仿真方面都存在不足。为了应对这些现有挑战,我们首先介绍了最新版本的 3D-ICE 3.1,该版本采用了新颖的非均匀热建模技术,支持自定义热网格离散等级。3D-ICE 3.1 提高了热分析的精度,降低了仿真开销。然后,结合利用架构模拟器 gem5-X 的高效、快速离线应用剖析策略,我们提出了一个新颖的 DTM 评估框架。该框架使我们能够探索新型 DTM 方法,以优化当代 3D MPSoC 的能效、可靠性和性能。实验结果表明,3D-ICE 3.1 实现了高精度,平均温度误差仅为 0.3K。随后,我们对各种 DTM 方法进行了评估,并提出了多代理强化学习 (MARL) 控制方法,以应对 3D MPSoC 在散热方面的严峻挑战。实验结果表明,基于 MARL 提出的 DTM 方法可以降低 13% 的功耗,同时保持与比较方法类似的性能水平。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Evaluation Framework for Dynamic Thermal Management Strategies in 3D MultiProcessor System-on-Chip Co-Design
Dynamic thermal management (DTM) has been widely adopted to improve the energy efficiency, reliability, and performance of modern Multi-Processor SoCs (MPSoCs). However, the evolving industry trends and heterogeneous architecture designs have introduced significant challenges in state-of-the-art DTM methods. Specifically, the emergence of heterogeneous design has led to increased localized and non-uniform hotspots, necessitating accurate and responsive DTM strategies. Additionally, the increased number of cores to be managed requires the DTM to optimize and coordinate the whole system. However, existing methodologies fail in both precise thermal modeling in localized hotspots and fast architecture simulation. To tackle these existing challenges, we first introduce the latest version of 3D-ICE 3.1, with a novel non-uniform thermal modeling technique to support customized discretization levels of thermal grids. 3D-ICE 3.1 improves the accuracy of thermal analysis and reduces simulation overhead. Then, in conjunction with an efficient and fast offline application profiling strategy utilizing the architecture simulator gem5-X, we propose a novel DTM evaluation framework. This framework enables us to explore novel DTM methods to optimize the energy efficiency, reliability, and performance of contemporary 3D MPSoCs. The experimental results demonstrate that 3D-ICE 3.1 achieves high accuracy, with only 0.3K mean temperature error. Subsequently, we evaluate various DTM methods and propose a Multi-Agent Reinforcement Learning (MARL) control to address the demanding thermal challenges of 3D MPSoCs. Our experimental results show that the proposed DTM method based on MARL can reduce power consumption by 13% while maintaining a similar performance level to the comparison methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems 工程技术-工程:电子与电气
CiteScore
11.00
自引率
9.40%
发文量
281
审稿时长
5.6 months
期刊介绍: IEEE Transactions on Parallel and Distributed Systems (TPDS) is published monthly. It publishes a range of papers, comments on previously published papers, and survey articles that deal with the parallel and distributed systems research areas of current importance to our readers. Particular areas of interest include, but are not limited to: a) Parallel and distributed algorithms, focusing on topics such as: models of computation; numerical, combinatorial, and data-intensive parallel algorithms, scalability of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling, and load balancing. b) Applications of parallel and distributed computing, including computational and data-enabled science and engineering, big data applications, parallel crowd sourcing, large-scale social network analysis, management of big data, cloud and grid computing, scientific and biomedical applications, mobile computing, and cyber-physical systems. c) Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; design, analysis, implementation, fault resilience and performance measurements of multiple-processor systems; multicore processors, heterogeneous many-core systems; petascale and exascale systems designs; novel big data architectures; special purpose architectures, including graphics processors, signal processors, network processors, media accelerators, and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient and green computing architectures; dependable architectures; and performance modeling and evaluation. d) Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, Internet computing and web services, resource management including green computing, middleware for grids, clouds, and data centers, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信