{"title":"Profile Guided Optimization Transfer-Learning for OpenCL/SYCL Kernel Compilation and Runtime","authors":"Wenju He, Maosu Zhao, Yuxin Zou, Feng Zou","doi":"10.1145/3585341.3585359","DOIUrl":null,"url":null,"abstract":"Reducing SYCL kernel compilation time and overhead of runtime are important topics for heterogeneous computing performance. Profile-Guided Optimization (PGO) is an optimization technique widely used in compiler to better optimize code. We apply PGO to both SYCL kernel compilation and backend runtime. The first experiment demonstrates transfer-learning that profiling data collected from SPEC CPU® 2006 benchmark can benefit kernel compilation on OpenCL/SYCL benchmarks. The second experiment also demonstrates transfer-learning that profiling data collected from some OpenCL/SYCL benchmarks could be used to reduce CPU backend runtime overhead in unseen benchmarks.","PeriodicalId":360830,"journal":{"name":"Proceedings of the 2023 International Workshop on OpenCL","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 International Workshop on OpenCL","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3585341.3585359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Reducing SYCL kernel compilation time and overhead of runtime are important topics for heterogeneous computing performance. Profile-Guided Optimization (PGO) is an optimization technique widely used in compiler to better optimize code. We apply PGO to both SYCL kernel compilation and backend runtime. The first experiment demonstrates transfer-learning that profiling data collected from SPEC CPU® 2006 benchmark can benefit kernel compilation on OpenCL/SYCL benchmarks. The second experiment also demonstrates transfer-learning that profiling data collected from some OpenCL/SYCL benchmarks could be used to reduce CPU backend runtime overhead in unseen benchmarks.