基于npu加速模仿学习的异构多核热感知和qos感知优化

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2022-03-14 DOI:10.23919/DATE54114.2022.9774681

Martin Rapp, Nikita Krohmer, Heba Khdr, J. Henkel

{"title":"基于npu加速模仿学习的异构多核热感知和qos感知优化","authors":"Martin Rapp, Nikita Krohmer, Heba Khdr, J. Henkel","doi":"10.23919/DATE54114.2022.9774681","DOIUrl":null,"url":null,"abstract":"Task migration and dynamic voltage and frequency scaling (DVFS) are indispensable means in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem because 1) the diverse characteristics and QoS targets of applications require different optimizations, and 2) V/f levels are often shared between cores on a cluster, which requires a global optimization considering all running applications. State-of-the-art techniques for power or temperature minimization either rely on measurements that are often not available (such as power) or fail to consider all the dimensions of the problem (e.g., by using simplified analytical models). Imitation learning (IL) enables to use the optimality of an oracle policy, yet at low run-time overhead, by training a model from oracle demonstrations. We are the first to employ IL for temperature minimization under QoS targets. We tackle the complexity by using a neural network (NN) model and accelerate the NN inference using a neural processing unit (NPU). While such NN accelerators are becoming increasingly widespread on end devices, they are so far only used to accelerate user applications. In contrast, we use an accelerator on a real platform to accelerate NN-based resource management. Our evaluation on a HiKey970 board with an Arm big.LITTLE CPU and an NPU shows significant temperature reductions at a negligible overhead while satisfying OoS targets.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"NPU-Accelerated Imitation Learning for Thermal- and QoS-Aware Optimization of Heterogeneous Multi-Cores\",\"authors\":\"Martin Rapp, Nikita Krohmer, Heba Khdr, J. Henkel\",\"doi\":\"10.23919/DATE54114.2022.9774681\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Task migration and dynamic voltage and frequency scaling (DVFS) are indispensable means in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem because 1) the diverse characteristics and QoS targets of applications require different optimizations, and 2) V/f levels are often shared between cores on a cluster, which requires a global optimization considering all running applications. State-of-the-art techniques for power or temperature minimization either rely on measurements that are often not available (such as power) or fail to consider all the dimensions of the problem (e.g., by using simplified analytical models). Imitation learning (IL) enables to use the optimality of an oracle policy, yet at low run-time overhead, by training a model from oracle demonstrations. We are the first to employ IL for temperature minimization under QoS targets. We tackle the complexity by using a neural network (NN) model and accelerate the NN inference using a neural processing unit (NPU). While such NN accelerators are becoming increasingly widespread on end devices, they are so far only used to accelerate user applications. In contrast, we use an accelerator on a real platform to accelerate NN-based resource management. Our evaluation on a HiKey970 board with an Arm big.LITTLE CPU and an NPU shows significant temperature reductions at a negligible overhead while satisfying OoS targets.\",\"PeriodicalId\":232583,\"journal\":{\"name\":\"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/DATE54114.2022.9774681\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE54114.2022.9774681","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

任务迁移和动态电压频率缩放(DVFS)是异构集群多核处理器在自定义服务质量(QoS)目标下进行热优化不可缺少的手段。然而，选择执行每个应用程序的核心和每个集群的电压/频率(V/f)水平是一个复杂的问题，因为1)应用程序的不同特征和QoS目标需要不同的优化，2)V/f水平通常在集群上的核心之间共享，这需要考虑所有运行的应用程序的全局优化。最先进的功率或温度最小化技术要么依赖于通常不可用的测量(例如功率)，要么没有考虑到问题的所有维度(例如，通过使用简化的分析模型)。模仿学习(IL)能够使用oracle策略的最优性，但在低运行时开销下，通过从oracle演示中训练模型。我们是第一个在QoS目标下使用IL来实现温度最小化的。我们使用神经网络(NN)模型来解决复杂性问题，并使用神经处理单元(NPU)来加速神经网络推理。虽然这种神经网络加速器在终端设备上变得越来越普遍，但到目前为止，它们仅用于加速用户应用。相比之下，我们在真实平台上使用加速器来加速基于神经网络的资源管理。我们对Arm大的HiKey970板的评估。LITTLE CPU和NPU在满足OoS目标的同时，以可忽略不计的开销显着降低温度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NPU-Accelerated Imitation Learning for Thermal- and QoS-Aware Optimization of Heterogeneous Multi-Cores

Task migration and dynamic voltage and frequency scaling (DVFS) are indispensable means in thermal optimization of a heterogeneous clustered multi-core processor under user-defined quality of service (QoS) targets. However, selecting the core to execute each application and the voltage/frequency (V/f) levels of each cluster is a complex problem because 1) the diverse characteristics and QoS targets of applications require different optimizations, and 2) V/f levels are often shared between cores on a cluster, which requires a global optimization considering all running applications. State-of-the-art techniques for power or temperature minimization either rely on measurements that are often not available (such as power) or fail to consider all the dimensions of the problem (e.g., by using simplified analytical models). Imitation learning (IL) enables to use the optimality of an oracle policy, yet at low run-time overhead, by training a model from oracle demonstrations. We are the first to employ IL for temperature minimization under QoS targets. We tackle the complexity by using a neural network (NN) model and accelerate the NN inference using a neural processing unit (NPU). While such NN accelerators are becoming increasingly widespread on end devices, they are so far only used to accelerate user applications. In contrast, we use an accelerator on a real platform to accelerate NN-based resource management. Our evaluation on a HiKey970 board with an Arm big.LITTLE CPU and an NPU shows significant temperature reductions at a negligible overhead while satisfying OoS targets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量