A Performance Analysis of Vector Length Agnostic Code

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI:10.1109/HPCS48598.2019.9188238

Angela Pohl, Mirko Greese, Biagio Cosenza, B. Juurlink

{"title":"A Performance Analysis of Vector Length Agnostic Code","authors":"Angela Pohl, Mirko Greese, Biagio Cosenza, B. Juurlink","doi":"10.1109/HPCS48598.2019.9188238","DOIUrl":null,"url":null,"abstract":"Vector extensions are a popular mean to exploit data parallelism in applications. Over recent years, the most commonly used extensions have been growing in vector length and amount of vector instructions. However, code portability remains a problem when speaking about a compute continuum. Hence, vector length agnostic (VLA) architectures have been proposed for the future generations of ARM and RISC-V processors. With these architectures, code is vectorized independently of the vector length of the target hardware platform. It is therefore possible to tune software to a generic vector length. To understand the performance impact of VLA code compared to vector length specific code, we analyze the current capabilities of code generation for ARM’s SVE architecture. Our experiments show that VLA code reaches about 90% of the performance of vector length specific code, i.e. a 10% overhead is inferred due to global predication of instructions. Furthermore, we show that code performance is not increasing proportionally with increasing vector lengths due to the higher memory demands.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188238","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Vector extensions are a popular mean to exploit data parallelism in applications. Over recent years, the most commonly used extensions have been growing in vector length and amount of vector instructions. However, code portability remains a problem when speaking about a compute continuum. Hence, vector length agnostic (VLA) architectures have been proposed for the future generations of ARM and RISC-V processors. With these architectures, code is vectorized independently of the vector length of the target hardware platform. It is therefore possible to tune software to a generic vector length. To understand the performance impact of VLA code compared to vector length specific code, we analyze the current capabilities of code generation for ARM’s SVE architecture. Our experiments show that VLA code reaches about 90% of the performance of vector length specific code, i.e. a 10% overhead is inferred due to global predication of instructions. Furthermore, we show that code performance is not increasing proportionally with increasing vector lengths due to the higher memory demands.

查看原文本刊更多论文

向量长度不可知码的性能分析

矢量扩展是在应用程序中利用数据并行性的一种流行方法。近年来，最常用的扩展在向量长度和向量指令数量上都在增长。然而，当谈到计算连续体时，代码可移植性仍然是一个问题。因此，矢量长度不可知(VLA)架构已被提出用于未来几代ARM和RISC-V处理器。在这些体系结构中，代码的矢量化与目标硬件平台的矢量长度无关。因此，可以将软件调整为通用向量长度。为了理解VLA代码与向量长度特定代码相比对性能的影响，我们分析了ARM SVE架构当前的代码生成能力。我们的实验表明，VLA代码达到了向量长度特定代码的90%左右的性能，即由于指令的全局预测而推断出10%的开销。此外，我们表明，由于更高的内存需求，代码性能不会随着向量长度的增加而成比例地增加。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量