{"title":"使用英特尔先进的矢量扩展的二维卷积的高性能实现","authors":"Hossein Amiri, A. Shahbahrami","doi":"10.1109/AISP.2017.8324097","DOIUrl":null,"url":null,"abstract":"Convolution is the most important and fundamental concept in multimedia processing. For example, for digital image processing 2D convolution is used for different filtering operations. It has many mathematical operations and is performed on all image pixels. Therefore, it is almost a compute-intensive kernel. In order to improve its performance in this paper, we apply two approaches to vectorize it, broadcasting of coefficients and repetition of coefficients using Intrinsic Programming Model (IPM) and AVX technology. Our experimental results on an Intel Skylake microarchitecture show that the performance of broadcasting of coefficients is much higher than repetition of coefficients for different filter sizes and different image sizes. In addition, in order to evaluate the performance of Compiler Automatic Vectorization (CAV), and OpenCV library for this kernel, we use GCC and LLVM compilers. Our experimental results show that the performance of both IPM implementations are faster than GCC's and LLVM auto-vectorizations.","PeriodicalId":386952,"journal":{"name":"2017 Artificial Intelligence and Signal Processing Conference (AISP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"High performance implementation of 2D convolution using Intel's advanced vector extensions\",\"authors\":\"Hossein Amiri, A. Shahbahrami\",\"doi\":\"10.1109/AISP.2017.8324097\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolution is the most important and fundamental concept in multimedia processing. For example, for digital image processing 2D convolution is used for different filtering operations. It has many mathematical operations and is performed on all image pixels. Therefore, it is almost a compute-intensive kernel. In order to improve its performance in this paper, we apply two approaches to vectorize it, broadcasting of coefficients and repetition of coefficients using Intrinsic Programming Model (IPM) and AVX technology. Our experimental results on an Intel Skylake microarchitecture show that the performance of broadcasting of coefficients is much higher than repetition of coefficients for different filter sizes and different image sizes. In addition, in order to evaluate the performance of Compiler Automatic Vectorization (CAV), and OpenCV library for this kernel, we use GCC and LLVM compilers. Our experimental results show that the performance of both IPM implementations are faster than GCC's and LLVM auto-vectorizations.\",\"PeriodicalId\":386952,\"journal\":{\"name\":\"2017 Artificial Intelligence and Signal Processing Conference (AISP)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Artificial Intelligence and Signal Processing Conference (AISP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AISP.2017.8324097\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Artificial Intelligence and Signal Processing Conference (AISP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AISP.2017.8324097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High performance implementation of 2D convolution using Intel's advanced vector extensions
Convolution is the most important and fundamental concept in multimedia processing. For example, for digital image processing 2D convolution is used for different filtering operations. It has many mathematical operations and is performed on all image pixels. Therefore, it is almost a compute-intensive kernel. In order to improve its performance in this paper, we apply two approaches to vectorize it, broadcasting of coefficients and repetition of coefficients using Intrinsic Programming Model (IPM) and AVX technology. Our experimental results on an Intel Skylake microarchitecture show that the performance of broadcasting of coefficients is much higher than repetition of coefficients for different filter sizes and different image sizes. In addition, in order to evaluate the performance of Compiler Automatic Vectorization (CAV), and OpenCV library for this kernel, we use GCC and LLVM compilers. Our experimental results show that the performance of both IPM implementations are faster than GCC's and LLVM auto-vectorizations.