基于性能原语库和编译器指令api的非局部均值过滤器现代架构的比较评价

2021 IEEE 3rd International Conference on BioInspired Processing (BIP) Pub Date : 2021-11-04 DOI:10.1109/BIP53678.2021.9612827

Manuel Zumbado-Corrales, J. Castro, Esteban Meneses

{"title":"基于性能原语库和编译器指令api的非局部均值过滤器现代架构的比较评价","authors":"Manuel Zumbado-Corrales, J. Castro, Esteban Meneses","doi":"10.1109/BIP53678.2021.9612827","DOIUrl":null,"url":null,"abstract":"The performance achieved by an application is limited by architectural features such as program data access and processing patterns. Parallelization approaches exhibit dissimilar performance and have a direct impact in application execution time. Additionally, developing parallel code involves additional complexity and productivity for programmers to accelerate or rewrite the program. In this paper, we present a comparative performance evaluation of a CPU, GPU, and many-core (Xeon Phi KNL) architectures for the Non-Local Means filter. We asses the effect of different data access and processing patterns in two computational optimizations developed for the aforementioned filter. We follow a top-down approach in terms of the parallelization approach chosen, starting from performance primitives as a first step to give easy drop-in acceleration and then compiler directives with frameworks such as OpenMP and OpenACC as an intermediate step to map computing tasks to the underlying hardware. Results show that both libraries and directives are effective at accelerating code with a combination of both being necessary to overcome performance bottlenecks.","PeriodicalId":155935,"journal":{"name":"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comparative Evaluation of Modern Architectures for the Non-Local Means Filter using Performance Primitives Libraries and Compiler Directive APIs\",\"authors\":\"Manuel Zumbado-Corrales, J. Castro, Esteban Meneses\",\"doi\":\"10.1109/BIP53678.2021.9612827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance achieved by an application is limited by architectural features such as program data access and processing patterns. Parallelization approaches exhibit dissimilar performance and have a direct impact in application execution time. Additionally, developing parallel code involves additional complexity and productivity for programmers to accelerate or rewrite the program. In this paper, we present a comparative performance evaluation of a CPU, GPU, and many-core (Xeon Phi KNL) architectures for the Non-Local Means filter. We asses the effect of different data access and processing patterns in two computational optimizations developed for the aforementioned filter. We follow a top-down approach in terms of the parallelization approach chosen, starting from performance primitives as a first step to give easy drop-in acceleration and then compiler directives with frameworks such as OpenMP and OpenACC as an intermediate step to map computing tasks to the underlying hardware. Results show that both libraries and directives are effective at accelerating code with a combination of both being necessary to overcome performance bottlenecks.\",\"PeriodicalId\":155935,\"journal\":{\"name\":\"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIP53678.2021.9612827\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 3rd International Conference on BioInspired Processing (BIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIP53678.2021.9612827","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

应用程序实现的性能受到诸如程序数据访问和处理模式等体系结构特性的限制。并行化方法表现出不同的性能，并直接影响应用程序的执行时间。此外，开发并行代码涉及到程序员加速或重写程序的额外复杂性和生产力。在本文中，我们提出了非局部均值滤波器的CPU, GPU和多核(Xeon Phi KNL)架构的比较性能评估。我们在为上述过滤器开发的两种计算优化中评估了不同数据访问和处理模式的效果。就所选择的并行化方法而言，我们遵循自上而下的方法，从性能原语开始，作为提供易于插入式加速的第一步，然后将带有OpenMP和OpenACC等框架的编译器指令作为中间步骤，将计算任务映射到底层硬件。结果表明，库和指令在加速代码方面都是有效的，两者的结合是克服性能瓶颈所必需的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparative Evaluation of Modern Architectures for the Non-Local Means Filter using Performance Primitives Libraries and Compiler Directive APIs

The performance achieved by an application is limited by architectural features such as program data access and processing patterns. Parallelization approaches exhibit dissimilar performance and have a direct impact in application execution time. Additionally, developing parallel code involves additional complexity and productivity for programmers to accelerate or rewrite the program. In this paper, we present a comparative performance evaluation of a CPU, GPU, and many-core (Xeon Phi KNL) architectures for the Non-Local Means filter. We asses the effect of different data access and processing patterns in two computational optimizations developed for the aforementioned filter. We follow a top-down approach in terms of the parallelization approach chosen, starting from performance primitives as a first step to give easy drop-in acceleration and then compiler directives with frameworks such as OpenMP and OpenACC as an intermediate step to map computing tasks to the underlying hardware. Results show that both libraries and directives are effective at accelerating code with a combination of both being necessary to overcome performance bottlenecks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 3rd International Conference on BioInspired Processing (BIP)

自引率

0.00%

发文量