Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault
{"title":"使用SYCL实现AI图的性能可移植性","authors":"Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault","doi":"10.1109/P3HPC56579.2022.00016","DOIUrl":null,"url":null,"abstract":"The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.","PeriodicalId":261766,"journal":{"name":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards performance portability of AI graphs using SYCL\",\"authors\":\"Kumudha Narasimhan, Ouadie El Farouki, M. Goli, Muhammad Tanvir, S. Georgiev, Isaac Ault\",\"doi\":\"10.1109/P3HPC56579.2022.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.\",\"PeriodicalId\":261766,\"journal\":{\"name\":\"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/P3HPC56579.2022.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/P3HPC56579.2022.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards performance portability of AI graphs using SYCL
The wide adoption of Deep Neural Networks (DNN) has served as an incentive to design and manufacture powerful and specialized hardware technologies, targeting systems from Edge devices to Cloud and supercomputers.While the proposed ONNX as a de facto for DNN model description, provides portability across various AI frameworks, supporting DNN models on various hardware architectures remains challenging.SYCL provides a C++-based portable parallel programming model to target various devices. Thus, enabling SYCL backend for an AI framework can lead to a hardware-agnostic model for heterogeneous systems.This paper proposes a SYCL backend for ONNXRuntime as a possible solution towards the performance portability of deep learning algorithms. The proposed backend uses existing state-of-the-art SYCL-DNN and SYCL-BLAS libraries to invoke tuned SYCL kernels for DNN operations. Our performance evaluation shows that the proposed approach can achieve comparable performance with respect to the state-of-the-art optimized vendor-specific libraries.