SIF: Overcoming the limitations of SIMD devices via implicit permutation

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture Pub Date : 2010-04-01 DOI:10.1109/HPCA.2010.5416631

Libo Huang, Li Shen, Zhiying Wang, Wei Shi, Nong Xiao, Sheng Ma

{"title":"SIF: Overcoming the limitations of SIMD devices via implicit permutation","authors":"Libo Huang, Li Shen, Zhiying Wang, Wei Shi, Nong Xiao, Sheng Ma","doi":"10.1109/HPCA.2010.5416631","DOIUrl":null,"url":null,"abstract":"SIMD devices have gained widespread acceptance in modern microprocessor designs for their superior performance for multimedia applications. However, there are three remaining limitations to the efficient utilization of SIMD devices in general-purpose computer systems: memory alignment, data reorganization and control flow. This paper presents SIF, an efficient SIMD interface framework that addresses these three shortcomings without modifying existing ISA. It is designed around a permutation vector register file (PVRF) and it adds new extended instructions to set internal permutation state in SIMD datapath rather than putting the permutation state setting bits in every instruction. The implicit permutation capability provided by PVRF results in zero overhead, which frees the handling of three limitations by using permutation instructions. To further reduce the state setting instructions in SIMD datapath, a technique that moves the workloads from SIMD pipeline into scalar pipeline is also introduced. With the help of proposed compilation algorithm, SIF can efficiently transform regular SIMD codes into SIF codes which make it easily integrated in all existing SIMD devices. We implemented these techniques in a vectorizing compiler and experimental results show that most of the permutation overhead instructions can be eliminated and distinct performance speedup can be achieved, which is 37% higher than current SIMD techniques on average.","PeriodicalId":368621,"journal":{"name":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2010.5416631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

SIMD devices have gained widespread acceptance in modern microprocessor designs for their superior performance for multimedia applications. However, there are three remaining limitations to the efficient utilization of SIMD devices in general-purpose computer systems: memory alignment, data reorganization and control flow. This paper presents SIF, an efficient SIMD interface framework that addresses these three shortcomings without modifying existing ISA. It is designed around a permutation vector register file (PVRF) and it adds new extended instructions to set internal permutation state in SIMD datapath rather than putting the permutation state setting bits in every instruction. The implicit permutation capability provided by PVRF results in zero overhead, which frees the handling of three limitations by using permutation instructions. To further reduce the state setting instructions in SIMD datapath, a technique that moves the workloads from SIMD pipeline into scalar pipeline is also introduced. With the help of proposed compilation algorithm, SIF can efficiently transform regular SIMD codes into SIF codes which make it easily integrated in all existing SIMD devices. We implemented these techniques in a vectorizing compiler and experimental results show that most of the permutation overhead instructions can be eliminated and distinct performance speedup can be achieved, which is 37% higher than current SIMD techniques on average.

查看原文本刊更多论文

SIF:通过隐式排列克服SIMD器件的限制

SIMD器件因其优越的多媒体应用性能而在现代微处理器设计中得到了广泛的接受。然而，在通用计算机系统中有效利用SIMD设备还有三个限制:内存对齐、数据重组和控制流。本文提出了一种有效的SIMD接口框架SIF，它在不修改现有ISA的情况下解决了这三个缺点。它是围绕排列向量寄存器文件(PVRF)设计的，它添加了新的扩展指令来设置SIMD数据路径中的内部排列状态，而不是将排列状态设置位放在每个指令中。PVRF提供的隐式排列功能导致零开销，从而通过使用排列指令释放了对三个限制的处理。为了进一步减少SIMD数据路径中的状态设置指令，还引入了一种将工作负载从SIMD管道转移到标量管道的技术。在本文提出的编译算法的帮助下，SIF可以有效地将常规SIMD代码转换为SIF代码，使其易于集成到所有现有的SIMD设备中。我们在向量化编译器中实现了这些技术，实验结果表明，大多数排列开销指令可以被消除，并且可以实现明显的性能加速，比目前的SIMD技术平均提高37%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture

自引率

0.00%

发文量