Extracting SIMD parallelism from 'for' loops

Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops Pub Date : 2001-09-03 DOI:10.1109/ICPPW.2001.951843

V. Gustin, P. Bulić

{"title":"Extracting SIMD parallelism from 'for' loops","authors":"V. Gustin, P. Bulić","doi":"10.1109/ICPPW.2001.951843","DOIUrl":null,"url":null,"abstract":"The need for multimedia applications has prompted the addition of a multimedia instruction set (MMX) to most existing general-purpose microprocessors. The introduction of short single-instruction multiple data (SIMD) i.e. \"vectorized\" instructions to the microprocessor \"scalar\" instruction set is supported by special hardware which enables the execution of one instruction on multiple data sets. Such a vectorized instruction set is primarily used in multimedia applications, and it seems likely that it will grow rapidly over the next few years. Thus on the one hand we have modern multimedia execution hardware and on the other we have the software and the general compilers which are not able to automatically exploit the multimedia instruction set. In addition, the compiler is not able to locate SIMD parallelism within a basic block. Our solution to these problems is to find statement candidates in the program written in the language C/C++ (as we mainly use this language), and to employ the SIMD instruction set in the easiest possible way. As we know that the compiler cannot be user-changed or modified, we can only extend the functionality of the program (compiler) by the use of specialised library routines or by macros. We prefer the latter. Why? We believe that the use of the macro library is faster than function calls, and we expect it to be simpler and more friendly for the user. The algorithm for identifying candidates for parallel processing (ICPP) is based on the fact that the program does not need any \"correction\" or \"adoption\" prior to being analysed andfinally to being translated into the SIMD instruction set. We define the macro library MacroVect.c as the substitution for the discovered statement candidates.","PeriodicalId":93355,"journal":{"name":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","volume":"80 1","pages":"23-28"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2001.951843","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

The need for multimedia applications has prompted the addition of a multimedia instruction set (MMX) to most existing general-purpose microprocessors. The introduction of short single-instruction multiple data (SIMD) i.e. "vectorized" instructions to the microprocessor "scalar" instruction set is supported by special hardware which enables the execution of one instruction on multiple data sets. Such a vectorized instruction set is primarily used in multimedia applications, and it seems likely that it will grow rapidly over the next few years. Thus on the one hand we have modern multimedia execution hardware and on the other we have the software and the general compilers which are not able to automatically exploit the multimedia instruction set. In addition, the compiler is not able to locate SIMD parallelism within a basic block. Our solution to these problems is to find statement candidates in the program written in the language C/C++ (as we mainly use this language), and to employ the SIMD instruction set in the easiest possible way. As we know that the compiler cannot be user-changed or modified, we can only extend the functionality of the program (compiler) by the use of specialised library routines or by macros. We prefer the latter. Why? We believe that the use of the macro library is faster than function calls, and we expect it to be simpler and more friendly for the user. The algorithm for identifying candidates for parallel processing (ICPP) is based on the fact that the program does not need any "correction" or "adoption" prior to being analysed andfinally to being translated into the SIMD instruction set. We define the macro library MacroVect.c as the substitution for the discovered statement candidates.

查看原文本刊更多论文

从'for'循环中提取SIMD并行性

多媒体应用程序的需要促使在大多数现有的通用微处理器中增加多媒体指令集(MMX)。引入短单指令多数据(SIMD)，即。对微处理器“标量”指令集的“向量化”指令由特殊硬件支持，它可以在多个数据集上执行一条指令。这种向量化指令集主要用于多媒体应用程序，在未来几年内，它似乎将迅速增长。因此，一方面我们有现代化的多媒体执行硬件，另一方面我们有不能自动开发多媒体指令集的软件和通用编译器。此外，编译器无法在基本块中定位SIMD并行性。我们对这些问题的解决方案是在用C/ c++语言编写的程序中找到候选语句(因为我们主要使用这种语言)，并以最简单的方式使用SIMD指令集。因为我们知道编译器不能被用户更改或修改，我们只能通过使用专门的库例程或宏来扩展程序(编译器)的功能。我们倾向于后者。为什么?我们相信使用宏库比调用函数要快，我们希望它对用户来说更简单、更友好。识别候选并行处理(ICPP)的算法是基于这样一个事实，即程序在被分析并最终被翻译成SIMD指令集之前不需要任何“纠正”或“采用”。我们定义宏库macrovector .c作为发现的语句候选项的替代。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... ICPP Workshops on. International Conference on Parallel Processing Workshops

自引率

0.00%

发文量