PMEvo: portable inference of port mappings for out-of-order processors by evolutionary optimization

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation Pub Date : 2020-04-21 DOI:10.1145/3385412.3385995

Fabian Ritter, Sebastian Hack

{"title":"PMEvo: portable inference of port mappings for out-of-order processors by evolutionary optimization","authors":"Fabian Ritter, Sebastian Hack","doi":"10.1145/3385412.3385995","DOIUrl":null,"url":null,"abstract":"Achieving peak performance in a computer system requires optimizations in every layer of the system, be it hardware or software. A detailed understanding of the underlying hardware, and especially the processor, is crucial to optimize software. One key criterion for the performance of a processor is its ability to exploit instruction-level parallelism. This ability is determined by the port mapping of the processor, which describes the execution units of the processor for each instruction. Processor manufacturers usually do not share the port mappings of their microarchitectures. While approaches to automatically infer port mappings from experiments exist, they are based on processor-specific hardware performance counters that are not available on every platform. We present PMEvo, a framework to automatically infer port mappings solely based on the measurement of the execution time of short instruction sequences. PMEvo uses an evolutionary algorithm that evaluates the fitness of candidate mappings with an analytical throughput model formulated as a linear program. Our prototype implementation infers a port mapping for Intel's Skylake architecture that predicts measured instruction throughput with an accuracy that is competitive to existing work. Furthermore, it finds port mappings for AMD's Zen+ architecture and the ARM Cortex-A72 architecture, which are out of scope of existing techniques.","PeriodicalId":20580,"journal":{"name":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","volume":"100 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2020-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3385412.3385995","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

Achieving peak performance in a computer system requires optimizations in every layer of the system, be it hardware or software. A detailed understanding of the underlying hardware, and especially the processor, is crucial to optimize software. One key criterion for the performance of a processor is its ability to exploit instruction-level parallelism. This ability is determined by the port mapping of the processor, which describes the execution units of the processor for each instruction. Processor manufacturers usually do not share the port mappings of their microarchitectures. While approaches to automatically infer port mappings from experiments exist, they are based on processor-specific hardware performance counters that are not available on every platform. We present PMEvo, a framework to automatically infer port mappings solely based on the measurement of the execution time of short instruction sequences. PMEvo uses an evolutionary algorithm that evaluates the fitness of candidate mappings with an analytical throughput model formulated as a linear program. Our prototype implementation infers a port mapping for Intel's Skylake architecture that predicts measured instruction throughput with an accuracy that is competitive to existing work. Furthermore, it finds port mappings for AMD's Zen+ architecture and the ARM Cortex-A72 architecture, which are out of scope of existing techniques.

查看原文本刊更多论文

基于进化优化的乱序处理器端口映射的可移植推断

在计算机系统中实现最佳性能需要对系统的每一层进行优化，无论是硬件还是软件。详细了解底层硬件，特别是处理器，对于优化软件至关重要。处理器性能的一个关键标准是它利用指令级并行性的能力。这种能力是由处理器的端口映射决定的，它描述了处理器对每条指令的执行单元。处理器制造商通常不共享其微体系结构的端口映射。虽然存在从实验中自动推断端口映射的方法，但它们基于特定于处理器的硬件性能计数器，并不是每个平台上都可用。我们提出了PMEvo，一个仅基于短指令序列执行时间的测量来自动推断端口映射的框架。PMEvo使用一种进化算法来评估候选映射的适应度，并将分析吞吐量模型表述为线性程序。我们的原型实现推断出英特尔Skylake架构的端口映射，该架构预测测量指令吞吐量的准确性与现有工作相比具有竞争力。此外，它还发现了AMD Zen+架构和ARM Cortex-A72架构的端口映射，这超出了现有技术的范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation

自引率

0.00%

发文量