L. Lacassagne, D. Etiemble, A. Zahraee, A. Dominguez, P. Vezolle
{"title":"SIMD的高级变换和低级计算机视觉算法","authors":"L. Lacassagne, D. Etiemble, A. Zahraee, A. Dominguez, P. Vezolle","doi":"10.1145/2568058.2568067","DOIUrl":null,"url":null,"abstract":"This paper presents a review of algorithmic transforms called High Level Transforms for IBM, Intel and ARM SIMD multicore processors to accelerate the implementation of low level image processing algorithms. We show that these optimizations provide a significant acceleration. A first evaluation of 512-bit SIMD Xeon- Phi is also presented. We focus on the point that the combination of optimizations leading to the best execution time cannot be predicted, and thus, systematic benchmarking is mandatory. Once the best configuration is found for each architecture, a comparison of these performances is presented. The Harris points detection operator is selected as being representative of low level image processing and computer vision algorithms. Being composed of five convolutions, it is more complex than a simple filter and enables more opportunities to combine optimizations. The presented work can scale across a wide range of codes using 2D stencils and convolutions.","PeriodicalId":411100,"journal":{"name":"WPMVP '14","volume":"95 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"High level transforms for SIMD and low-level computer vision algorithms\",\"authors\":\"L. Lacassagne, D. Etiemble, A. Zahraee, A. Dominguez, P. Vezolle\",\"doi\":\"10.1145/2568058.2568067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a review of algorithmic transforms called High Level Transforms for IBM, Intel and ARM SIMD multicore processors to accelerate the implementation of low level image processing algorithms. We show that these optimizations provide a significant acceleration. A first evaluation of 512-bit SIMD Xeon- Phi is also presented. We focus on the point that the combination of optimizations leading to the best execution time cannot be predicted, and thus, systematic benchmarking is mandatory. Once the best configuration is found for each architecture, a comparison of these performances is presented. The Harris points detection operator is selected as being representative of low level image processing and computer vision algorithms. Being composed of five convolutions, it is more complex than a simple filter and enables more opportunities to combine optimizations. The presented work can scale across a wide range of codes using 2D stencils and convolutions.\",\"PeriodicalId\":411100,\"journal\":{\"name\":\"WPMVP '14\",\"volume\":\"95 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-02-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WPMVP '14\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2568058.2568067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WPMVP '14","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2568058.2568067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High level transforms for SIMD and low-level computer vision algorithms
This paper presents a review of algorithmic transforms called High Level Transforms for IBM, Intel and ARM SIMD multicore processors to accelerate the implementation of low level image processing algorithms. We show that these optimizations provide a significant acceleration. A first evaluation of 512-bit SIMD Xeon- Phi is also presented. We focus on the point that the combination of optimizations leading to the best execution time cannot be predicted, and thus, systematic benchmarking is mandatory. Once the best configuration is found for each architecture, a comparison of these performances is presented. The Harris points detection operator is selected as being representative of low level image processing and computer vision algorithms. Being composed of five convolutions, it is more complex than a simple filter and enables more opportunities to combine optimizations. The presented work can scale across a wide range of codes using 2D stencils and convolutions.