AVX-512应用程序的自动核心专业化

Proceedings of the 13th ACM International Systems and Storage Conference Pub Date : 2020-05-30 DOI:10.1145/3383669.3398282

Mathias Gottschlag, Peter Brantsch, Frank Bellosa

{"title":"AVX-512应用程序的自动核心专业化","authors":"Mathias Gottschlag, Peter Brantsch, Frank Bellosa","doi":"10.1145/3383669.3398282","DOIUrl":null,"url":null,"abstract":"Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.","PeriodicalId":225327,"journal":{"name":"Proceedings of the 13th ACM International Systems and Storage Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Automatic Core Specialization for AVX-512 Applications\",\"authors\":\"Mathias Gottschlag, Peter Brantsch, Frank Bellosa\",\"doi\":\"10.1145/3383669.3398282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.\",\"PeriodicalId\":225327,\"journal\":{\"name\":\"Proceedings of the 13th ACM International Systems and Storage Conference\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th ACM International Systems and Storage Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3383669.3398282\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th ACM International Systems and Storage Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3383669.3398282","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

高级矢量扩展(AVX)指令操作宽SIMD矢量。由于由此产生的高功耗，最近的英特尔处理器在执行复杂的AVX2和AVX-512指令时降低了频率。在以下两种情况下，执行非avx代码会因频率降低而减慢:当它在同一核心的兄弟超线程上并行执行时，或者当它直接遵循AVX2/AVX-512代码时，因为恢复非avx频率会延迟。因此，由AVX-512和非avx代码组成的异构工作负载通常会平均降低10%。在这项工作中，我们描述了一种在两种情况下减轻涉及AVX-512指令的工作负载的频率降低速度的方法。我们的方法采用核心专门化，并将CPU内核划分为AVX-512内核和非AVX-512内核，只有前者执行AVX-512指令，因此潜在的频率降低的影响仅限于这些内核。为了将线程迁移到AVX-512内核，我们将非AVX-512内核配置为在执行AVX-512指令时引发异常。我们使用启发式来确定何时将线程迁移回非avx -512内核。我们的方法能够将各种常见基准测试的频率降低开销减少70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Automatic Core Specialization for AVX-512 Applications

Advanced Vector Extension (AVX) instructions operate on wide SIMD vectors. Due to the resulting high power consumption, recent Intel processors reduce their frequency when executing complex AVX2 and AVX-512 instructions. Following non-AVX code is slowed down by this frequency reduction in two situations: When it executes on the sibling hyperthread of the same core in parallel or - as restoring the non-AVX frequency is delayed - when it directly follows the AVX2/AVX-512 code. As a result, heterogeneous workloads consisting of AVX-512 and non-AVX code are frequently slowed down by 10% on average. In this work, we describe a method to mitigate the frequency reduction slowdown for workloads involving AVX-512 instructions in both situations. Our approach employs core specialization and partitions the CPU cores into AVX-512 cores and non-AVX-512 cores, and only the former execute AVX-512 instructions so that the impact of potential frequency reductions is limited to those cores. To migrate threads to AVX-512 cores, we configure the non-AVX-512 cores to raise an exception when executing AVX-512 instructions. We use a heuristic to determine when to migrate threads back to non-AVX-512 cores. Our approach is able to reduce the frequency reduction overhead by 70% for an assortment of common benchmarks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 13th ACM International Systems and Storage Conference

自引率

0.00%

发文量