George Karfakis;Myriam Bouzidi;Yunhyeok Im;Alexander Graening;Suresh K. Sitaraman;Puneet Gupta
{"title":"Optimizing Thermal Performance in 2.5D Systems Using Embedded Isolators","authors":"George Karfakis;Myriam Bouzidi;Yunhyeok Im;Alexander Graening;Suresh K. Sitaraman;Puneet Gupta","doi":"10.1109/JETCAS.2025.3595909","DOIUrl":null,"url":null,"abstract":"This paper investigates thermal management in tightly integrated heterogeneous chiplet systems, focusing on a novel approach using embedded thermal isolators. In many 2.5D systems, such as modern enterprise GPUs, thermally sensitive chiplets like High Bandwidth Memory (HBM) are thermally coupled to high-power compute chiplets, leading to performance degradation. We propose and evaluate the use of thermal isolators embedded within the heat spreader to effectively thermally decouple chiplets. Our thermal simulations of a water-cooled 2.5D integrated GPU system indicate that conventional approaches like thermally-aware floorplanning are less effective due to the dominant heat transfer through the heat spreader. In contrast, our proposed thermal isolators can significantly increase thermal isolation between chiplets (by up to 61%), or even reduce overall average peak chip temperature (by up to 22.5%). We develop a closed-loop workflow incorporating thermal results to quantify performance impacts of thermal-induced throttling, finding that in an example GPU+HBM system, the isolator approach can yield performance gains of up to 37% for memory-bound workloads. These findings open up new avenues for thermal management and thermal-system co-optimization in 2.5D heterogeneous integrated systems, potentially enabling more efficient and higher-performing chiplet-based architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"15 3","pages":"458-468"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11113276/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates thermal management in tightly integrated heterogeneous chiplet systems, focusing on a novel approach using embedded thermal isolators. In many 2.5D systems, such as modern enterprise GPUs, thermally sensitive chiplets like High Bandwidth Memory (HBM) are thermally coupled to high-power compute chiplets, leading to performance degradation. We propose and evaluate the use of thermal isolators embedded within the heat spreader to effectively thermally decouple chiplets. Our thermal simulations of a water-cooled 2.5D integrated GPU system indicate that conventional approaches like thermally-aware floorplanning are less effective due to the dominant heat transfer through the heat spreader. In contrast, our proposed thermal isolators can significantly increase thermal isolation between chiplets (by up to 61%), or even reduce overall average peak chip temperature (by up to 22.5%). We develop a closed-loop workflow incorporating thermal results to quantify performance impacts of thermal-induced throttling, finding that in an example GPU+HBM system, the isolator approach can yield performance gains of up to 37% for memory-bound workloads. These findings open up new avenues for thermal management and thermal-system co-optimization in 2.5D heterogeneous integrated systems, potentially enabling more efficient and higher-performing chiplet-based architectures.
期刊介绍:
The IEEE Journal on Emerging and Selected Topics in Circuits and Systems is published quarterly and solicits, with particular emphasis on emerging areas, special issues on topics that cover the entire scope of the IEEE Circuits and Systems (CAS) Society, namely the theory, analysis, design, tools, and implementation of circuits and systems, spanning their theoretical foundations, applications, and architectures for signal and information processing.