用ISPC编写可扩展的SIMD程序

WPMVP '14 Pub Date : 2014-02-16 DOI:10.1145/2568058.2568065

James C. Brodman, Dmitry Babokin, I. Filippov, P. Tu

{"title":"用ISPC编写可扩展的SIMD程序","authors":"James C. Brodman, Dmitry Babokin, I. Filippov, P. Tu","doi":"10.1145/2568058.2568065","DOIUrl":null,"url":null,"abstract":"Modern processors contain many resources for parallel execution. In addition to having multiple cores, processors can also contain vector functional units that are capable of performing a single operation on multiple inputs in parallel. Taking advantage of this vector hardware is essential to obtaining peak performance on a machine, but it is often challenging for programmers to do so.\n This paper presents a performance study of compiling several benchmarks from the domains of computer graphics, financial modeling, and high-performance computing for different vector instruction sets using the Intel SPMD Program Compiler, an alternative to compiler autovectorization of scalar code or handwriting vector code with intrinsics. ispc is both a language and compiler that produces high quality code for SIMD CPU vector extensions such as Intel Streaming SIMD Extensions (SSE), Intel Advanced Vector Extensions (AVX), or ARM NEON. We present the results of compiling the same ispc source program for various targets. The performance of the resulting ispc versions is compared to that of scalar C++ code, and we also examine the scalability of the benchmarks when targeting wider vector units.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Writing scalable SIMD programs with ISPC\",\"authors\":\"James C. Brodman, Dmitry Babokin, I. Filippov, P. Tu\",\"doi\":\"10.1145/2568058.2568065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern processors contain many resources for parallel execution. In addition to having multiple cores, processors can also contain vector functional units that are capable of performing a single operation on multiple inputs in parallel. Taking advantage of this vector hardware is essential to obtaining peak performance on a machine, but it is often challenging for programmers to do so.\\n This paper presents a performance study of compiling several benchmarks from the domains of computer graphics, financial modeling, and high-performance computing for different vector instruction sets using the Intel SPMD Program Compiler, an alternative to compiler autovectorization of scalar code or handwriting vector code with intrinsics. ispc is both a language and compiler that produces high quality code for SIMD CPU vector extensions such as Intel Streaming SIMD Extensions (SSE), Intel Advanced Vector Extensions (AVX), or ARM NEON. We present the results of compiling the same ispc source program for various targets. The performance of the resulting ispc versions is compared to that of scalar C++ code, and we also examine the scalability of the benchmarks when targeting wider vector units.\",\"PeriodicalId\":411100,\"journal\":{\"name\":\"WPMVP '14\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WPMVP '14\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2568058.2568065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WPMVP '14","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2568058.2568065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

现代处理器包含许多用于并行执行的资源。除了具有多核之外，处理器还可以包含矢量功能单元，能够并行地对多个输入执行单个操作。利用这种矢量硬件对于在机器上获得最佳性能是必不可少的，但是对于程序员来说，这样做通常是具有挑战性的。本文介绍了一项性能研究，使用英特尔SPMD程序编译器编译来自计算机图形学，金融建模和高性能计算领域的几个基准，用于不同的矢量指令集，这是标量代码或手写矢量代码的编译器自动向量化的替代方案。ispc是一种语言和编译器，可以为SIMD CPU矢量扩展(如Intel Streaming SIMD extensions (SSE)， Intel Advanced vector extensions (AVX)或ARM NEON)生成高质量的代码。我们给出了为不同目标编译同一个ispc源程序的结果。结果ispc版本的性能与标量c++代码的性能进行了比较，并且我们还检查了针对更宽向量单位的基准测试的可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Writing scalable SIMD programs with ISPC

Modern processors contain many resources for parallel execution. In addition to having multiple cores, processors can also contain vector functional units that are capable of performing a single operation on multiple inputs in parallel. Taking advantage of this vector hardware is essential to obtaining peak performance on a machine, but it is often challenging for programmers to do so. This paper presents a performance study of compiling several benchmarks from the domains of computer graphics, financial modeling, and high-performance computing for different vector instruction sets using the Intel SPMD Program Compiler, an alternative to compiler autovectorization of scalar code or handwriting vector code with intrinsics. ispc is both a language and compiler that produces high quality code for SIMD CPU vector extensions such as Intel Streaming SIMD Extensions (SSE), Intel Advanced Vector Extensions (AVX), or ARM NEON. We present the results of compiling the same ispc source program for various targets. The performance of the resulting ispc versions is compared to that of scalar C++ code, and we also examine the scalability of the benchmarks when targeting wider vector units.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

WPMVP '14

自引率

0.00%

发文量