{"title":"Portability and Performance Assessment of the Non-Negative Matrix Factorization Algorithm with OpenMP and SYCL","authors":"Youssef Faqir-Rhazoui, Carlos García, F. Tirado","doi":"10.1109/CLEI56649.2022.9959906","DOIUrl":null,"url":null,"abstract":"The SYCL standard was released to improve code portability across heterogeneous environments. Intel released the oneAPI toolkit, which includes the Data-Parallel C++ (DPC++) compiler which is the Intel’s SYCL implementation. SYCL is designed to use a single source code to target multiple accelerators such as: multi-core CPUs, GPUs and even FPGAs. Additionally, the C/C++ compiler provided in the oneAPI toolkit supports OpenMP which also allows targeting codes on both CPU and GPU devices. In this paper, the performance of SYCL and OpenMP is evaluated using the well-known non-negative matrix factorization (NMF) algorithm. Three different NMF implementations are developed: baseline, SYCL and OpenMP versions to analyze the acceleration on CPU and GPU. Experimental results show that while the two programming models perform almost identically on CPU, on GPU, SYCL outperforms its OpenMP counterpart slightly.","PeriodicalId":156073,"journal":{"name":"2022 XVLIII Latin American Computer Conference (CLEI)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 XVLIII Latin American Computer Conference (CLEI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLEI56649.2022.9959906","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The SYCL standard was released to improve code portability across heterogeneous environments. Intel released the oneAPI toolkit, which includes the Data-Parallel C++ (DPC++) compiler which is the Intel’s SYCL implementation. SYCL is designed to use a single source code to target multiple accelerators such as: multi-core CPUs, GPUs and even FPGAs. Additionally, the C/C++ compiler provided in the oneAPI toolkit supports OpenMP which also allows targeting codes on both CPU and GPU devices. In this paper, the performance of SYCL and OpenMP is evaluated using the well-known non-negative matrix factorization (NMF) algorithm. Three different NMF implementations are developed: baseline, SYCL and OpenMP versions to analyze the acceleration on CPU and GPU. Experimental results show that while the two programming models perform almost identically on CPU, on GPU, SYCL outperforms its OpenMP counterpart slightly.