Manuel Zumbado-Corrales, J. Castro, Esteban Meneses
{"title":"基于性能原语库和编译器指令api的非局部均值过滤器现代架构的比较评价","authors":"Manuel Zumbado-Corrales, J. Castro, Esteban Meneses","doi":"10.1109/BIP53678.2021.9612827","DOIUrl":null,"url":null,"abstract":"The performance achieved by an application is limited by architectural features such as program data access and processing patterns. Parallelization approaches exhibit dissimilar performance and have a direct impact in application execution time. Additionally, developing parallel code involves additional complexity and productivity for programmers to accelerate or rewrite the program. In this paper, we present a comparative performance evaluation of a CPU, GPU, and many-core (Xeon Phi KNL) architectures for the Non-Local Means filter. We asses the effect of different data access and processing patterns in two computational optimizations developed for the aforementioned filter. We follow a top-down approach in terms of the parallelization approach chosen, starting from performance primitives as a first step to give easy drop-in acceleration and then compiler directives with frameworks such as OpenMP and OpenACC as an intermediate step to map computing tasks to the underlying hardware. Results show that both libraries and directives are effective at accelerating code with a combination of both being necessary to overcome performance bottlenecks.","PeriodicalId":155935,"journal":{"name":"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Evaluation of Modern Architectures for the Non-Local Means Filter using Performance Primitives Libraries and Compiler Directive APIs\",\"authors\":\"Manuel Zumbado-Corrales, J. Castro, Esteban Meneses\",\"doi\":\"10.1109/BIP53678.2021.9612827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance achieved by an application is limited by architectural features such as program data access and processing patterns. Parallelization approaches exhibit dissimilar performance and have a direct impact in application execution time. Additionally, developing parallel code involves additional complexity and productivity for programmers to accelerate or rewrite the program. In this paper, we present a comparative performance evaluation of a CPU, GPU, and many-core (Xeon Phi KNL) architectures for the Non-Local Means filter. We asses the effect of different data access and processing patterns in two computational optimizations developed for the aforementioned filter. We follow a top-down approach in terms of the parallelization approach chosen, starting from performance primitives as a first step to give easy drop-in acceleration and then compiler directives with frameworks such as OpenMP and OpenACC as an intermediate step to map computing tasks to the underlying hardware. Results show that both libraries and directives are effective at accelerating code with a combination of both being necessary to overcome performance bottlenecks.\",\"PeriodicalId\":155935,\"journal\":{\"name\":\"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIP53678.2021.9612827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIP53678.2021.9612827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Comparative Evaluation of Modern Architectures for the Non-Local Means Filter using Performance Primitives Libraries and Compiler Directive APIs
The performance achieved by an application is limited by architectural features such as program data access and processing patterns. Parallelization approaches exhibit dissimilar performance and have a direct impact in application execution time. Additionally, developing parallel code involves additional complexity and productivity for programmers to accelerate or rewrite the program. In this paper, we present a comparative performance evaluation of a CPU, GPU, and many-core (Xeon Phi KNL) architectures for the Non-Local Means filter. We asses the effect of different data access and processing patterns in two computational optimizations developed for the aforementioned filter. We follow a top-down approach in terms of the parallelization approach chosen, starting from performance primitives as a first step to give easy drop-in acceleration and then compiler directives with frameworks such as OpenMP and OpenACC as an intermediate step to map computing tasks to the underlying hardware. Results show that both libraries and directives are effective at accelerating code with a combination of both being necessary to overcome performance bottlenecks.