{"title":"Enabling zero-copy OpenMP offloading on the PULP many-core accelerator","authors":"Alessandro Capotondi, A. Marongiu","doi":"10.1145/3078659.3079071","DOIUrl":null,"url":null,"abstract":"Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4x faster execution compared to traditional copy-based offload mechanisms.","PeriodicalId":240210,"journal":{"name":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078659.3079071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Many-core heterogeneous designs are nowadays widely available among embedded systems. Initiatives such as the HSA push for a model where the host processor and the accelerator(s) communicate via coherent, Unified Virtual Memory (UVM). In this paper we describe our experience in porting the OpenMP v4 programming model to a low-end, heterogeneous embedded system based on the PULP many-core accelerator featuring lightweight (software-managed) UVM support. We describe a GCC-based toolchain which enables: i) the automatic generation of host and accelerator binaries from a single, high-level, OpenMP parallel program; ii) the automatic instrumentation of the accelerator program to transparently manage UVM. This enables up to 4x faster execution compared to traditional copy-based offload mechanisms.