Alessandro Capotondi, Germain Haugou, A. Marongiu, L. Benini
{"title":"Runtime Support for Multiple Offload-Based Programming Models on Embedded Manycore Accelerators","authors":"Alessandro Capotondi, Germain Haugou, A. Marongiu, L. Benini","doi":"10.1145/2723772.2723773","DOIUrl":null,"url":null,"abstract":"Many modern high-end embedded systems are designed as heterogeneous systems-on-chip (SoCs), where a powerful general purpose multicore host processor is coupled to a manycore accelerator. The host executes legacy applications on top of standard operating systems, while the accelerator runs highly parallel code kernels within those applications. Several programming models are currently being proposed to program such accelerator-based systems, OpenCL and OpenMP being the most relevant examples. In the near future it will be common to have multiple applications, coded with different programming models, concurrently requiring the use of the manycore accelerator. In this paper we present a runtime system for a cluster-based manycore accelerator, optimized for the concurrent execution of OpenMP and OpenCL kernels. The runtime supports spatial partitioning of the manycore, where clusters can be grouped into several \"virtual\" accelerator instances. Our runtime design is modular and relies on a \"generic\" component for resource (cluster) scheduling, plus \"specialized\" components which efficiently deploy generic offload requests into an implementation of the target programming model's semantics. We evaluate the proposed runtime system on a real heterogeneous system, the STMicroelectronics STHORM development board.","PeriodicalId":350480,"journal":{"name":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Workshop on Code Optimisation for Multi and Many Cores","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2723772.2723773","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Many modern high-end embedded systems are designed as heterogeneous systems-on-chip (SoCs), where a powerful general purpose multicore host processor is coupled to a manycore accelerator. The host executes legacy applications on top of standard operating systems, while the accelerator runs highly parallel code kernels within those applications. Several programming models are currently being proposed to program such accelerator-based systems, OpenCL and OpenMP being the most relevant examples. In the near future it will be common to have multiple applications, coded with different programming models, concurrently requiring the use of the manycore accelerator. In this paper we present a runtime system for a cluster-based manycore accelerator, optimized for the concurrent execution of OpenMP and OpenCL kernels. The runtime supports spatial partitioning of the manycore, where clusters can be grouped into several "virtual" accelerator instances. Our runtime design is modular and relies on a "generic" component for resource (cluster) scheduling, plus "specialized" components which efficiently deploy generic offload requests into an implementation of the target programming model's semantics. We evaluate the proposed runtime system on a real heterogeneous system, the STMicroelectronics STHORM development board.