{"title":"研究了使用SIMD指令的两种图像处理算法","authors":"E. Welch, D. Patru, E. Saber, K. Bengtson","doi":"10.1109/WNYIPW.2012.6466650","DOIUrl":null,"url":null,"abstract":"Most image processing algorithms are parallelizable, i.e. the calculation of one pixel does not affect another one. SIMD architectures, including Intel's WMMX and SSE and ARM's NEON, can exploit this fact by processing multiple pixels at a time, which can result in significant speedups. This study investigates the use of NEON SIMD instructions for two image processing algorithms. The latter are altered to process four pixels at a time, for which a theoretical speedup factor of four can be achieved. In addition, parts of the original implementation have been replaced with inline functions or modified at assembly code level. Experimental benchmark data shows the actual execution speed to be between two to three times higher than the original reference. These results prove that SIMD instructions can significantly speedup image processing algorithms through proper code manipulations.","PeriodicalId":218110,"journal":{"name":"2012 Western New York Image Processing Workshop","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A study of the use of SIMD instructions for two image processing algorithms\",\"authors\":\"E. Welch, D. Patru, E. Saber, K. Bengtson\",\"doi\":\"10.1109/WNYIPW.2012.6466650\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most image processing algorithms are parallelizable, i.e. the calculation of one pixel does not affect another one. SIMD architectures, including Intel's WMMX and SSE and ARM's NEON, can exploit this fact by processing multiple pixels at a time, which can result in significant speedups. This study investigates the use of NEON SIMD instructions for two image processing algorithms. The latter are altered to process four pixels at a time, for which a theoretical speedup factor of four can be achieved. In addition, parts of the original implementation have been replaced with inline functions or modified at assembly code level. Experimental benchmark data shows the actual execution speed to be between two to three times higher than the original reference. These results prove that SIMD instructions can significantly speedup image processing algorithms through proper code manipulations.\",\"PeriodicalId\":218110,\"journal\":{\"name\":\"2012 Western New York Image Processing Workshop\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Western New York Image Processing Workshop\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WNYIPW.2012.6466650\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Western New York Image Processing Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WNYIPW.2012.6466650","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A study of the use of SIMD instructions for two image processing algorithms
Most image processing algorithms are parallelizable, i.e. the calculation of one pixel does not affect another one. SIMD architectures, including Intel's WMMX and SSE and ARM's NEON, can exploit this fact by processing multiple pixels at a time, which can result in significant speedups. This study investigates the use of NEON SIMD instructions for two image processing algorithms. The latter are altered to process four pixels at a time, for which a theoretical speedup factor of four can be achieved. In addition, parts of the original implementation have been replaced with inline functions or modified at assembly code level. Experimental benchmark data shows the actual execution speed to be between two to three times higher than the original reference. These results prove that SIMD instructions can significantly speedup image processing algorithms through proper code manipulations.