基于机器学习的在线全芯片热图估计

Sheriff Sadiqbatcha, Yue Zhao, Jinwei Zhang, H. Amrouch, J. Henkel, S. Tan
{"title":"基于机器学习的在线全芯片热图估计","authors":"Sheriff Sadiqbatcha, Yue Zhao, Jinwei Zhang, H. Amrouch, J. Henkel, S. Tan","doi":"10.1109/ASP-DAC47756.2020.9045204","DOIUrl":null,"url":null,"abstract":"Runtime power and thermal control is crucial in any modern processor. However, these control schemes require accurate real-time temperature information, ideally of the entire die area, in order to be effective. On-chip temperature sensors alone cannot provide the full-chip temperature information since the number of sensors that are typically available is very limited due to their high area and power overheads. Furthermore, as we will demonstrate, the peak locations within hot-spots are not stationary and are very workload dependent, making it difficult to rely on fixed temperature sensors alone. Therefore, we propose a novel approach to real-time estimation of full-chip transient heatmaps for commercial processors based on machine learning. The model derived in this work supplements the temperature data sensed from the existing on-chip sensors, allowing for the development of more robust runtime power and thermal control schemes that can take advantage of the additional thermal information that is otherwise not available. The new approach involves offline acquisition of accurate spatial and temporal heatmaps using an infrared thermal imaging setup while nominal working conditions are maintained on the chip. To build the dynamic thermal model, we apply Long-Short-Term-Memory (LSTM) neutral networks with system-level variables such as chip frequency, instruction counts, and other performance metrics as inputs. To reduce the dimensionality of the model, 2D spatial discrete cosine transformation (DCT) is first performed on the heatmaps so that they can be expressed with just their dominant DCT frequencies. Our study shows that only 6×6 DCT coefficients are required to maintain sufficient accuracy across a variety of workloads. Experimental results show that the proposed approach can estimate the full-chip heatmaps with less than $1.4^{o}$C root-mean-square-error and take only $\\sim$19ms for each inference which suits well for real-time use.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Machine Learning Based Online Full-Chip Heatmap Estimation\",\"authors\":\"Sheriff Sadiqbatcha, Yue Zhao, Jinwei Zhang, H. Amrouch, J. Henkel, S. Tan\",\"doi\":\"10.1109/ASP-DAC47756.2020.9045204\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Runtime power and thermal control is crucial in any modern processor. However, these control schemes require accurate real-time temperature information, ideally of the entire die area, in order to be effective. On-chip temperature sensors alone cannot provide the full-chip temperature information since the number of sensors that are typically available is very limited due to their high area and power overheads. Furthermore, as we will demonstrate, the peak locations within hot-spots are not stationary and are very workload dependent, making it difficult to rely on fixed temperature sensors alone. Therefore, we propose a novel approach to real-time estimation of full-chip transient heatmaps for commercial processors based on machine learning. The model derived in this work supplements the temperature data sensed from the existing on-chip sensors, allowing for the development of more robust runtime power and thermal control schemes that can take advantage of the additional thermal information that is otherwise not available. The new approach involves offline acquisition of accurate spatial and temporal heatmaps using an infrared thermal imaging setup while nominal working conditions are maintained on the chip. To build the dynamic thermal model, we apply Long-Short-Term-Memory (LSTM) neutral networks with system-level variables such as chip frequency, instruction counts, and other performance metrics as inputs. To reduce the dimensionality of the model, 2D spatial discrete cosine transformation (DCT) is first performed on the heatmaps so that they can be expressed with just their dominant DCT frequencies. Our study shows that only 6×6 DCT coefficients are required to maintain sufficient accuracy across a variety of workloads. Experimental results show that the proposed approach can estimate the full-chip heatmaps with less than $1.4^{o}$C root-mean-square-error and take only $\\\\sim$19ms for each inference which suits well for real-time use.\",\"PeriodicalId\":125112,\"journal\":{\"name\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC47756.2020.9045204\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC47756.2020.9045204","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

摘要

运行时电源和热控制在任何现代处理器中都是至关重要的。然而,这些控制方案需要精确的实时温度信息,理想情况下是整个模具区域的温度信息,以便有效。片上温度传感器本身不能提供全芯片温度信息,因为通常可用的传感器数量由于其高面积和功耗开销而非常有限。此外,正如我们将演示的那样,热点内的峰值位置不是固定的,并且非常依赖于工作量,因此很难单独依赖固定的温度传感器。因此,我们提出了一种基于机器学习的商业处理器全芯片瞬态热图实时估计的新方法。本工作中导出的模型补充了现有片上传感器感测到的温度数据,允许开发更强大的运行时功率和热控制方案,这些方案可以利用其他方式无法获得的额外热信息。新方法涉及使用红外热成像设置离线获取精确的空间和时间热图,同时在芯片上保持标称工作条件。为了构建动态热模型,我们将长短期记忆(LSTM)中性网络与系统级变量(如芯片频率、指令计数和其他性能指标)作为输入。为了降低模型的维数,首先对热图进行二维空间离散余弦变换(DCT),使热图仅用其主导DCT频率表示。我们的研究表明,只需要6×6 DCT系数就可以在各种工作负载中保持足够的准确性。实验结果表明,该方法能以小于1.4^{o}$C的均方根误差估计全芯片热图,每次推理耗时仅为$\sim$19ms,适合实时使用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Machine Learning Based Online Full-Chip Heatmap Estimation
Runtime power and thermal control is crucial in any modern processor. However, these control schemes require accurate real-time temperature information, ideally of the entire die area, in order to be effective. On-chip temperature sensors alone cannot provide the full-chip temperature information since the number of sensors that are typically available is very limited due to their high area and power overheads. Furthermore, as we will demonstrate, the peak locations within hot-spots are not stationary and are very workload dependent, making it difficult to rely on fixed temperature sensors alone. Therefore, we propose a novel approach to real-time estimation of full-chip transient heatmaps for commercial processors based on machine learning. The model derived in this work supplements the temperature data sensed from the existing on-chip sensors, allowing for the development of more robust runtime power and thermal control schemes that can take advantage of the additional thermal information that is otherwise not available. The new approach involves offline acquisition of accurate spatial and temporal heatmaps using an infrared thermal imaging setup while nominal working conditions are maintained on the chip. To build the dynamic thermal model, we apply Long-Short-Term-Memory (LSTM) neutral networks with system-level variables such as chip frequency, instruction counts, and other performance metrics as inputs. To reduce the dimensionality of the model, 2D spatial discrete cosine transformation (DCT) is first performed on the heatmaps so that they can be expressed with just their dominant DCT frequencies. Our study shows that only 6×6 DCT coefficients are required to maintain sufficient accuracy across a variety of workloads. Experimental results show that the proposed approach can estimate the full-chip heatmaps with less than $1.4^{o}$C root-mean-square-error and take only $\sim$19ms for each inference which suits well for real-time use.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信