WPMVP '14最新文献

筛选
英文 中文
High level transforms for SIMD and low-level computer vision algorithms SIMD的高级变换和低级计算机视觉算法
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568067
L. Lacassagne, D. Etiemble, A. Zahraee, A. Dominguez, P. Vezolle
{"title":"High level transforms for SIMD and low-level computer vision algorithms","authors":"L. Lacassagne, D. Etiemble, A. Zahraee, A. Dominguez, P. Vezolle","doi":"10.1145/2568058.2568067","DOIUrl":"https://doi.org/10.1145/2568058.2568067","url":null,"abstract":"This paper presents a review of algorithmic transforms called High Level Transforms for IBM, Intel and ARM SIMD multicore processors to accelerate the implementation of low level image processing algorithms. We show that these optimizations provide a significant acceleration. A first evaluation of 512-bit SIMD Xeon- Phi is also presented. We focus on the point that the combination of optimizations leading to the best execution time cannot be predicted, and thus, systematic benchmarking is mandatory. Once the best configuration is found for each architecture, a comparison of these performances is presented. The Harris points detection operator is selected as being representative of low level image processing and computer vision algorithms. Being composed of five convolutions, it is more complex than a simple filter and enables more opportunities to combine optimizations. The presented work can scale across a wide range of codes using 2D stencils and convolutions.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129752055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
A SIMD programming model for dart, javascript,and other dynamically typed scripting languages 用于dart、javascript和其他动态类型脚本语言的SIMD编程模型
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568066
J. McCutchan, Haitao Feng, Nicholas D. Matsakis, Zachary R. Anderson, P. Jensen
{"title":"A SIMD programming model for dart, javascript,and other dynamically typed scripting languages","authors":"J. McCutchan, Haitao Feng, Nicholas D. Matsakis, Zachary R. Anderson, P. Jensen","doi":"10.1145/2568058.2568066","DOIUrl":"https://doi.org/10.1145/2568058.2568066","url":null,"abstract":"It has not been possible to take advantage of the SIMD co-processors available in all x86 and most ARM processors shipping today in dynamically typed scripting languages. Web browsers have become a mainstream platform to deliver large and complex applications with feature sets and performance comparable to native applications, programmers must choose between Dart and JavaScript when writing web programs. This paper introduces an explicit SIMD programming model for Dart and JavaScript, we show that it can be compiled to efficient x86/SSE or ARM/Neon code by both Dart and JavaScript virtual machines achieving a 300%-600% speed increase across a variety of benchmarks. The result of this work is that more sophisticated and performant applications can be built to run in web browsers. The ideas introduced in this paper can also be used in other dynamically typed scripting languages to provide a similarly performant interface to SIMD co-processors.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"160 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113997529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
OpenCL framework for ARM processors with NEON support 支持NEON的ARM处理器的OpenCL框架
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568064
Gangwon Jo, W. J. Jeon, Wookeun Jung, Gordon Taft, Jaejin Lee
{"title":"OpenCL framework for ARM processors with NEON support","authors":"Gangwon Jo, W. J. Jeon, Wookeun Jung, Gordon Taft, Jaejin Lee","doi":"10.1145/2568058.2568064","DOIUrl":"https://doi.org/10.1145/2568058.2568064","url":null,"abstract":"The state-of-the-art ARM processors provide multiple cores and SIMD instructions. OpenCL is a promising programming model for utilizing such parallel processing capability because of its SPMD programming model and built-in vector support. Moreover, it provides portability between multicore ARM processors and accelerators in embedded systems. In this paper, we introduce the design and implementation of an efficient OpenCL framework for multicore ARM processors. Computational tasks in a program are implemented as OpenCL kernels and run on all CPU cores in parallel by our OpenCL framework. Vector operations and built-in functions in OpenCL kernels are optimized using the NEON SIMD instruction set. We evaluate our OpenCL framework using 37 benchmark applications. The result shows that our approach is effective and promising.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130191727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Exploring the vectorization of python constructs using pythran and boost SIMD 使用pythran和boost SIMD探索python构造的矢量化
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568060
S. Guelton, J. Falcou, Pierrick Brunet
{"title":"Exploring the vectorization of python constructs using pythran and boost SIMD","authors":"S. Guelton, J. Falcou, Pierrick Brunet","doi":"10.1145/2568058.2568060","DOIUrl":"https://doi.org/10.1145/2568058.2568060","url":null,"abstract":"The Python language is highly dynamic, most notably due to late binding. As a consequence, programs using Python typically run an order of magnitude slower than their C counterpart. It is also a high level language whose semantic can be made more static without much change from a user point of view in the case of mathematical applications. In that case, the language provides several vectorization opportunities that are studied in this paper, and evaluated in the context of Pythran, an ahead-of-time compiler that turns Python module into C++ meta-programs.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122032679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Sierra: a SIMD extension for C++ Sierra: c++的SIMD扩展
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568062
Roland Leißa, Immanuel Haffner, Sebastian Hack
{"title":"Sierra: a SIMD extension for C++","authors":"Roland Leißa, Immanuel Haffner, Sebastian Hack","doi":"10.1145/2568058.2568062","DOIUrl":"https://doi.org/10.1145/2568058.2568062","url":null,"abstract":"Nowadays, SIMD hardware is omnipresent in computers. Nonetheless, many software projects make hardly use of SIMD instructions: Applications are usually written in general-purpose languages like C++. However, general-purpose languages only provide poor abstractions for SIMD programming enforcing an error-prone, assembly-like programming style. An alternative are data-parallel languages. They indeed offer more convenience to target SIMD architectures but introduce their own set of problems. In particular, programmers are often unwilling to port their working C++ code to a new programming language.\u0000 In this paper we present Sierra: a SIMD extension for C++. It combines the full power of C++ with an intuitive and effective way to address SIMD hardware. With Sierra, the programmer can write efficient, portable and maintainable code. It is particularly easy to enhance existing code to run efficiently on SIMD machines.\u0000 In contrast to prior approaches, the programmer has explicit control over the involved vector lengths.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133562331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Writing scalable SIMD programs with ISPC 用ISPC编写可扩展的SIMD程序
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568065
James C. Brodman, Dmitry Babokin, I. Filippov, P. Tu
{"title":"Writing scalable SIMD programs with ISPC","authors":"James C. Brodman, Dmitry Babokin, I. Filippov, P. Tu","doi":"10.1145/2568058.2568065","DOIUrl":"https://doi.org/10.1145/2568058.2568065","url":null,"abstract":"Modern processors contain many resources for parallel execution. In addition to having multiple cores, processors can also contain vector functional units that are capable of performing a single operation on multiple inputs in parallel. Taking advantage of this vector hardware is essential to obtaining peak performance on a machine, but it is often challenging for programmers to do so.\u0000 This paper presents a performance study of compiling several benchmarks from the domains of computer graphics, financial modeling, and high-performance computing for different vector instruction sets using the Intel SPMD Program Compiler, an alternative to compiler autovectorization of scalar code or handwriting vector code with intrinsics. ispc is both a language and compiler that produces high quality code for SIMD CPU vector extensions such as Intel Streaming SIMD Extensions (SSE), Intel Advanced Vector Extensions (AVX), or ARM NEON. We present the results of compiling the same ispc source program for various targets. The performance of the resulting ispc versions is compared to that of scalar C++ code, and we also examine the scalability of the benchmarks when targeting wider vector units.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134490474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SIMDizing pairwise sums: a summation algorithm balancing accuracy with throughput SIMDizing pairwise sum:一种平衡精度和吞吐量的求和算法
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568070
Barnaby Dalton, Amy Wang, Bob Blainey
{"title":"SIMDizing pairwise sums: a summation algorithm balancing accuracy with throughput","authors":"Barnaby Dalton, Amy Wang, Bob Blainey","doi":"10.1145/2568058.2568070","DOIUrl":"https://doi.org/10.1145/2568058.2568070","url":null,"abstract":"Implementing summation when accuracy and throughput need to be balanced is a challenging endevour. We present experimental results that provide a sense when to start worrying and the expense of the various solutions that exist. We also present a new algorithm based on pairwise summation that achieves 89% of the throughput of the fastest summation algorithms when the data is not resident in L1 cache while eclipsing the accuracy of signifigantly slower compensated sums like Kahan summation and Kahan-Babuska that are typically used when accuracy is important.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129920242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Simple, portable and fast SIMD intrinsic programming: generic simd library 简单、可移植、快速的 SIMD 本征编程:通用 simd 库
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568059
Haichuan Wang, Peng Wu, Ilie Gabriel Tanase, M. Serrano, J. Moreira
{"title":"Simple, portable and fast SIMD intrinsic programming: generic simd library","authors":"Haichuan Wang, Peng Wu, Ilie Gabriel Tanase, M. Serrano, J. Moreira","doi":"10.1145/2568058.2568059","DOIUrl":"https://doi.org/10.1145/2568058.2568059","url":null,"abstract":"Using SIMD (Single Instruction Multiple Data) is a cost-effective way to explore data parallelism on modern processors. Most processor vendors today provide SIMD engines, such as Altivec/VSX for POWER, SSE/AVX for Intel processors, and NEON for ARM. While high-level SIMD programming models are rapidly evolving, for many SIMD developers, the most effective way to get the performance out of SIMD is still by programming directly via vendor-provided SIMD intrinsics. However, intrinsics programming is both tedious and error-prone, and worst of all, introduces non-portable codes.\u0000 This paper presents the Generic SIMD Library (https://github.com/genericsimd/generic_simd/), an open-source, portable C++ interface that provides an abstraction of short vectors and overloads most C/C++ operators for short vectors. The library provides several mappings from platform-specific intrinsics to the generic SIMD intrinsic interface so that codes developed based on the library are portable across different SIMD platforms.\u0000 We have evaluated the library with several applications from the multimedia, data analytics and math domains. Compared with platform-specific intrinsics codes, using Generic SIMD Library results in less line-of-code, a 22% reduction on average, and achieves similar performance as platform-specific intrinsics versions.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128514666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Vector seeker: a tool for finding vector potential 矢量搜索器:寻找矢量势的工具
WPMVP '14 Pub Date : 2014-02-16 DOI: 10.1145/2568058.2568069
G. C. Evans, S. Abraham, B. Kuhn, D. Padua
{"title":"Vector seeker: a tool for finding vector potential","authors":"G. C. Evans, S. Abraham, B. Kuhn, D. Padua","doi":"10.1145/2568058.2568069","DOIUrl":"https://doi.org/10.1145/2568058.2568069","url":null,"abstract":"The importance of vector instructions is growing in modern computers. Almost all architectures include some form of vector instructions and the tendency is for the size of the instructions to grow with newer designs. To take advantage of the performance that these systems offer, it is imperative that programs use these instructions, and yet they do not always do so. The tools to take advantage of these extensions require programmer assistance either by hand coding or providing hints to the compiler.\u0000 We present Vector Seeker, a tool to help investigate vector parallelism in existing codes. Vector Seeker runs with the execution of a program to optimistically measure the vector parallelism that is present. Besides describing Vector Seeker, the paper also evaluates its effectiveness using two applications from Petascale Application Collaboration Teams (PACT) and eight applications from Media Bench II. These results are compared to known results from manual vectorization studies. Finally, we use the tool to automatically analyze codes from Numerical Recipes and TSVC and then compare the results with the automatic vectorization algorithms of Intel's ICC.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127515763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips 在现代多核和多核芯片上比较不同x86 SIMD指令集在医学成像应用中的性能
WPMVP '14 Pub Date : 2014-01-29 DOI: 10.1145/2568058.2568068
Johannes Hofmann, Jan Treibig, G. Hager, G. Wellein
{"title":"Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips","authors":"Johannes Hofmann, Jan Treibig, G. Hager, G. Wellein","doi":"10.1145/2568058.2568068","DOIUrl":"https://doi.org/10.1145/2568058.2568068","url":null,"abstract":"Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the efficiency of different SIMD-vectorized implementations of the RabbitCT benchmark. RabbitCT performs 3D image reconstruction by back projection, a vital operation in computed tomography applications. The underlying algorithm is a challenge for vectorization because it consists, apart from a streaming part, also of a bilinear interpolation requiring scattered access to image data. We analyze the performance of SSE (128 bit), AVX (256 bit), AVX2 (256 bit), and IMCI (512 bit) implementations on recent Intel x86 systems. A special emphasis is put on the vector gather implementation on Intel Haswell and Knights Corner microarchitectures. Finally we discuss why GPU implementations perform much better for this specific algorithm.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132594387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信