{"title":"用于OpenCL应用程序的通信库重叠计算和通信","authors":"T. Komoda, Shinobu Miwa, Hiroshi Nakamura","doi":"10.1109/IPDPSW.2012.68","DOIUrl":null,"url":null,"abstract":"User-friendly parallel programming environments, such as CUDA and OpenCL are widely used for accelerators. They provide programmers with useful APIs, but the APIs are still low level primitives. Therefore, in order to apply communication optimization techniques, such as double buffering techniques, programmers have to manually write the programs with the primitives. Manual communication optimization requires programmers to have significant knowledge of both application characteristics and CPU-accelerator architecture. This prevents many application developers from effective utilization of accelerators. In addition, managing communication is a tedious and error-prone task even for expert programmers. Thus, it is necessary to develop a communication system which is highly abstracted but still capable of optimization. For this purpose, this paper proposes an OpenCL based communication library. To maximize performance improvement, the proposed library provides a simple but effective programming interface based on Stream Graph in order to specify an applications communication pattern. We have implemented a prototype system on OpenCL platform and applied it to several image processing applications. Our evaluation shows that the library successfully masks the details of accelerator memory management while it can achieve comparable speedup to manual optimization in which we use existing low level interfaces.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Communication Library to Overlap Computation and Communication for OpenCL Application\",\"authors\":\"T. Komoda, Shinobu Miwa, Hiroshi Nakamura\",\"doi\":\"10.1109/IPDPSW.2012.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"User-friendly parallel programming environments, such as CUDA and OpenCL are widely used for accelerators. They provide programmers with useful APIs, but the APIs are still low level primitives. Therefore, in order to apply communication optimization techniques, such as double buffering techniques, programmers have to manually write the programs with the primitives. Manual communication optimization requires programmers to have significant knowledge of both application characteristics and CPU-accelerator architecture. This prevents many application developers from effective utilization of accelerators. In addition, managing communication is a tedious and error-prone task even for expert programmers. Thus, it is necessary to develop a communication system which is highly abstracted but still capable of optimization. For this purpose, this paper proposes an OpenCL based communication library. To maximize performance improvement, the proposed library provides a simple but effective programming interface based on Stream Graph in order to specify an applications communication pattern. We have implemented a prototype system on OpenCL platform and applied it to several image processing applications. Our evaluation shows that the library successfully masks the details of accelerator memory management while it can achieve comparable speedup to manual optimization in which we use existing low level interfaces.\",\"PeriodicalId\":378335,\"journal\":{\"name\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2012.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2012.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Communication Library to Overlap Computation and Communication for OpenCL Application
User-friendly parallel programming environments, such as CUDA and OpenCL are widely used for accelerators. They provide programmers with useful APIs, but the APIs are still low level primitives. Therefore, in order to apply communication optimization techniques, such as double buffering techniques, programmers have to manually write the programs with the primitives. Manual communication optimization requires programmers to have significant knowledge of both application characteristics and CPU-accelerator architecture. This prevents many application developers from effective utilization of accelerators. In addition, managing communication is a tedious and error-prone task even for expert programmers. Thus, it is necessary to develop a communication system which is highly abstracted but still capable of optimization. For this purpose, this paper proposes an OpenCL based communication library. To maximize performance improvement, the proposed library provides a simple but effective programming interface based on Stream Graph in order to specify an applications communication pattern. We have implemented a prototype system on OpenCL platform and applied it to several image processing applications. Our evaluation shows that the library successfully masks the details of accelerator memory management while it can achieve comparable speedup to manual optimization in which we use existing low level interfaces.