Mohammed Bakr Sikal;Heba Khdr;Lokesh Siddhu;Jörg Henkel
{"title":"利用 3-D HBM 在集群 Manycores 上实现基于 ML 的散热和缓存争用缓解","authors":"Mohammed Bakr Sikal;Heba Khdr;Lokesh Siddhu;Jörg Henkel","doi":"10.1109/TCAD.2024.3438998","DOIUrl":null,"url":null,"abstract":"Enabled by the recent advancements in 2.5D/3-D integration and packaging, the integration of clustered manycore processors with high-bandwidth memory (HBM) is gaining prominence to satisfy the increasing memory bandwidth demands. Although this integration can offer significant performance gains, it is still limited by cache contention in the final-level cache on the clusters and by the thermal issues in the 3-D HBM. While the existing state-of-the-art resource management techniques have tackled these issues in isolation, we argue that the cache contention and the temperature of both the manycore and the HBM must be considered jointly to harness the full performance potential of such modern architectures. To cover this gap in the literature, we present MTCM, the first resource management technique that considers the cache contention in maximizing the system performance, while maintaining the thermal safety across both the manycore and the HBM stack. Enabled by our accurate, yet lightweight, neural network models, our proposed task migration and dynamic voltage and frequency scaling policies can accurately predict the impact of runtime decisions on the performance and temperature of both the subsystems. Our extensive evaluation experiments reveal a significant performance improvement over existing state of the art by up to \n<inline-formula> <tex-math>$1\\times $ </tex-math></inline-formula>\n, while maintaining thermal safety of both the manycore and the HBM.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"43 11","pages":"3614-3625"},"PeriodicalIF":2.7000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ML-Based Thermal and Cache Contention Alleviation on Clustered Manycores With 3-D HBM\",\"authors\":\"Mohammed Bakr Sikal;Heba Khdr;Lokesh Siddhu;Jörg Henkel\",\"doi\":\"10.1109/TCAD.2024.3438998\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Enabled by the recent advancements in 2.5D/3-D integration and packaging, the integration of clustered manycore processors with high-bandwidth memory (HBM) is gaining prominence to satisfy the increasing memory bandwidth demands. Although this integration can offer significant performance gains, it is still limited by cache contention in the final-level cache on the clusters and by the thermal issues in the 3-D HBM. While the existing state-of-the-art resource management techniques have tackled these issues in isolation, we argue that the cache contention and the temperature of both the manycore and the HBM must be considered jointly to harness the full performance potential of such modern architectures. To cover this gap in the literature, we present MTCM, the first resource management technique that considers the cache contention in maximizing the system performance, while maintaining the thermal safety across both the manycore and the HBM stack. Enabled by our accurate, yet lightweight, neural network models, our proposed task migration and dynamic voltage and frequency scaling policies can accurately predict the impact of runtime decisions on the performance and temperature of both the subsystems. Our extensive evaluation experiments reveal a significant performance improvement over existing state of the art by up to \\n<inline-formula> <tex-math>$1\\\\times $ </tex-math></inline-formula>\\n, while maintaining thermal safety of both the manycore and the HBM.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"43 11\",\"pages\":\"3614-3625\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10745850/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10745850/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
ML-Based Thermal and Cache Contention Alleviation on Clustered Manycores With 3-D HBM
Enabled by the recent advancements in 2.5D/3-D integration and packaging, the integration of clustered manycore processors with high-bandwidth memory (HBM) is gaining prominence to satisfy the increasing memory bandwidth demands. Although this integration can offer significant performance gains, it is still limited by cache contention in the final-level cache on the clusters and by the thermal issues in the 3-D HBM. While the existing state-of-the-art resource management techniques have tackled these issues in isolation, we argue that the cache contention and the temperature of both the manycore and the HBM must be considered jointly to harness the full performance potential of such modern architectures. To cover this gap in the literature, we present MTCM, the first resource management technique that considers the cache contention in maximizing the system performance, while maintaining the thermal safety across both the manycore and the HBM stack. Enabled by our accurate, yet lightweight, neural network models, our proposed task migration and dynamic voltage and frequency scaling policies can accurately predict the impact of runtime decisions on the performance and temperature of both the subsystems. Our extensive evaluation experiments reveal a significant performance improvement over existing state of the art by up to
$1\times $
, while maintaining thermal safety of both the manycore and the HBM.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.