{"title":"Extensions over OpenCL for Latency Reduction and Critical Applications","authors":"Grigore Lupescu, E. Slusanschi, N. Tapus","doi":"10.1109/SYNASC.2015.64","DOIUrl":null,"url":null,"abstract":"Hardware and software stack complexity make programming GPGPUs difficult and limit application portability. This article first discusses challenges imposed by the current hardware and software model in GPGPU systems which relies heavily on the HOST device (CPU). We then identify system bottlenecks both in the hardware design and in the software stack and present two ideas to extend the HOST and DEVICE side of the OpenCL API with the aim to improve latency and device safety. As a first goal we target HOST side latency reduction using user synchronization directives. Our second goal was to improve on DEVICE side latency and add safety through a software layer which manages kernel execution. For both HOST and DEVICE side latency reduction we present concrete performance results.","PeriodicalId":6488,"journal":{"name":"2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"213 1","pages":"379-385"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC.2015.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Hardware and software stack complexity make programming GPGPUs difficult and limit application portability. This article first discusses challenges imposed by the current hardware and software model in GPGPU systems which relies heavily on the HOST device (CPU). We then identify system bottlenecks both in the hardware design and in the software stack and present two ideas to extend the HOST and DEVICE side of the OpenCL API with the aim to improve latency and device safety. As a first goal we target HOST side latency reduction using user synchronization directives. Our second goal was to improve on DEVICE side latency and add safety through a software layer which manages kernel execution. For both HOST and DEVICE side latency reduction we present concrete performance results.