{"title":"基于OpenMP和SYCL的非负矩阵分解算法的可移植性和性能评估","authors":"Youssef Faqir-Rhazoui, Carlos García, F. Tirado","doi":"10.1109/CLEI56649.2022.9959906","DOIUrl":null,"url":null,"abstract":"The SYCL standard was released to improve code portability across heterogeneous environments. Intel released the oneAPI toolkit, which includes the Data-Parallel C++ (DPC++) compiler which is the Intel’s SYCL implementation. SYCL is designed to use a single source code to target multiple accelerators such as: multi-core CPUs, GPUs and even FPGAs. Additionally, the C/C++ compiler provided in the oneAPI toolkit supports OpenMP which also allows targeting codes on both CPU and GPU devices. In this paper, the performance of SYCL and OpenMP is evaluated using the well-known non-negative matrix factorization (NMF) algorithm. Three different NMF implementations are developed: baseline, SYCL and OpenMP versions to analyze the acceleration on CPU and GPU. Experimental results show that while the two programming models perform almost identically on CPU, on GPU, SYCL outperforms its OpenMP counterpart slightly.","PeriodicalId":156073,"journal":{"name":"2022 XVLIII Latin American Computer Conference (CLEI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Portability and Performance Assessment of the Non-Negative Matrix Factorization Algorithm with OpenMP and SYCL\",\"authors\":\"Youssef Faqir-Rhazoui, Carlos García, F. Tirado\",\"doi\":\"10.1109/CLEI56649.2022.9959906\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The SYCL standard was released to improve code portability across heterogeneous environments. Intel released the oneAPI toolkit, which includes the Data-Parallel C++ (DPC++) compiler which is the Intel’s SYCL implementation. SYCL is designed to use a single source code to target multiple accelerators such as: multi-core CPUs, GPUs and even FPGAs. Additionally, the C/C++ compiler provided in the oneAPI toolkit supports OpenMP which also allows targeting codes on both CPU and GPU devices. In this paper, the performance of SYCL and OpenMP is evaluated using the well-known non-negative matrix factorization (NMF) algorithm. Three different NMF implementations are developed: baseline, SYCL and OpenMP versions to analyze the acceleration on CPU and GPU. Experimental results show that while the two programming models perform almost identically on CPU, on GPU, SYCL outperforms its OpenMP counterpart slightly.\",\"PeriodicalId\":156073,\"journal\":{\"name\":\"2022 XVLIII Latin American Computer Conference (CLEI)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 XVLIII Latin American Computer Conference (CLEI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLEI56649.2022.9959906\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 XVLIII Latin American Computer Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI56649.2022.9959906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Portability and Performance Assessment of the Non-Negative Matrix Factorization Algorithm with OpenMP and SYCL
The SYCL standard was released to improve code portability across heterogeneous environments. Intel released the oneAPI toolkit, which includes the Data-Parallel C++ (DPC++) compiler which is the Intel’s SYCL implementation. SYCL is designed to use a single source code to target multiple accelerators such as: multi-core CPUs, GPUs and even FPGAs. Additionally, the C/C++ compiler provided in the oneAPI toolkit supports OpenMP which also allows targeting codes on both CPU and GPU devices. In this paper, the performance of SYCL and OpenMP is evaluated using the well-known non-negative matrix factorization (NMF) algorithm. Three different NMF implementations are developed: baseline, SYCL and OpenMP versions to analyze the acceleration on CPU and GPU. Experimental results show that while the two programming models perform almost identically on CPU, on GPU, SYCL outperforms its OpenMP counterpart slightly.