Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications最新文献

MPI Acceleration of Image Classification: Are We Seeing the Resurgence of MPI in Solving Big Data Problems? 图像分类的MPI加速:我们是否看到MPI在解决大数据问题中的复苏?

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3085158.3091993

Sameer Kumar

引用次数: 0

How Effective is Design Abstraction in Thrust?: An Empirical Evaluation 设计抽象在推力中有多有效?实证评价

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3085158.3086159

Ajai V. George, Sankar Manoj, S. Gupte, S. Sarkar

{"title":"How Effective is Design Abstraction in Thrust?: An Empirical Evaluation","authors":"Ajai V. George, Sankar Manoj, S. Gupte, S. Sarkar","doi":"10.1145/3085158.3086159","DOIUrl":"https://doi.org/10.1145/3085158.3086159","url":null,"abstract":"High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. A good design abstraction paradigm strikes a balance between the abstraction and visibility over the hardware. This allows the programmer to write applications without having to understand the hardware nuances while exploiting the computing power optimally. In this paper we have analyzed the power of design abstraction of a popular design abstraction framework called Thrust both from ease of programming and performance perspectives. We have shown that while Thrust framework is good in describing an algorithm compared to the native CUDA or OpenMP version but it has quite a few design limitations. With respect to CUDA it does not provide any abstraction over the shared, texture or constant memory usage to the programmer. We have compared the performance of a Thrust application code in CUDA, OpenMP and the CPP backends with respect to the native versions (implementing exactly same algorithm), written for these backends and found that the current Thrust version performs poorly in most of the cases. While we conclude that the framework is not ready for writing applications that can exploit the optimal performance from the hardware, we also highlight the improvements necessary for the framework to make the performance comparable.","PeriodicalId":425891,"journal":{"name":"Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130900587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Session details: Session 1 会话详细信息:会话1

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3248714

Atul Kumar

引用次数: 0

Session details: Session 2 会话详情:会话2

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3248715

S. Sarkar

引用次数: 0

Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms 使用高级GPU任务探索异构平台上的内存和通信选项

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3085158.3086160

Chao Liu, J. Bhimani, M. Leeser

{"title":"Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms","authors":"Chao Liu, J. Bhimani, M. Leeser","doi":"10.1145/3085158.3086160","DOIUrl":"https://doi.org/10.1145/3085158.3086160","url":null,"abstract":"Heterogeneous computing platforms that use GPUs for acceleration are becoming prevalent. Developing parallel applications for GPU platforms and optimizing GPU related applications for good performance is important. In this work, we develop a set of applications based on a high level task design, which ensures a well defined structure for portability improvement. Together with the GPU task implementation, we utilize a uniform interface to allocate and manage memory blocks that are used by both host and device. In this way we can choose the appropriate types of memory for host/device communication easily and flexibly in GPU tasks. Through asynchronous task execution and CUDA streams, we can explore concurrent GPU kernels for performance improvement when running multiple tasks. We developed a test benchmark set containing nine different kernel applications. Through tests we can learn that pinned memory can improve host/device data transfer for GPU platforms. The performance of unified memory differs a lot on different GPU architectures and is not a good choice if performance is the main focus. The multiple task tests show that applications based on our GPU tasks can effectively make use of the concurrent kernel ability of modern GPUs for better resource utilization.","PeriodicalId":425891,"journal":{"name":"Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

READEX Tool Suite for Energy-efficiency Tuning of HPC Applications READEX工具套件的能源效率调整的HPC应用程序

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3085158.3091994

Anamika Chowdhury, Madhura Kumaraswamy, M. Gerndt

引用次数: 3

PRESGen: A Fully Automatic Equivalence Checker for Validating Optimizing and Parallelizing Transformations PRESGen:一个用于验证优化和并行化转换的全自动等效检查器

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI: 10.1145/3085158.3086158

S. Bandyopadhyay, K. Banerjee

{"title":"PRESGen: A Fully Automatic Equivalence Checker for Validating Optimizing and Parallelizing Transformations","authors":"S. Bandyopadhyay, K. Banerjee","doi":"10.1145/3085158.3086158","DOIUrl":"https://doi.org/10.1145/3085158.3086158","url":null,"abstract":"Petri net has been a popular choice of model of computation (MoC) for representing parallel programs. PRES+ is an extension of the traditional Petri net model which is specially equipped to precisely model embedded systems. Since multi-core and multiprocessor systems have proliferated in the domain of embedded systems as well, it has become critical to validate the optimizing and parallelizing transformations which embedded system specifications go through before being implemented in the hardware. PRES+ model based equivalence checkers for validating such transformations already exist. However, construction of the PRES+ models from the original and the translated codes in these equivalence checkers was not done in an automated manner; thus, leaving scope for inaccurate representation of the PRES+ models since they had to be done manually. Moreover, PRES+ model tends to grow more rapidly with the program size when compared to other MoCs, such as FSMD. To tackle these problems, we propose a method for automated construction of PRES+ models from high-level language programs and using an existing translation scheme to convert PRES+ models to FSMD models, we validate the transformations using a state-of-the-art FSMD equivalence checker. Thus, we have effectively composed an end-to-end fully automatic equivalence checker for validating optimizing and parallelizing transformations. The experimental results demonstrate the practical applicability of our method.","PeriodicalId":425891,"journal":{"name":"Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122171318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications 2017年并行和高性能应用软件工程方法研讨会论文集

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 1900-01-01 DOI: 10.1145/3085158

引用次数: 0