通用稀疏线性求解器的片上异构实现

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum Pub Date : 2013-05-20 DOI:10.1109/IPDPSW.2013.51

Arash Sadrieh, Stefano Charissis, A. Hill

{"title":"通用稀疏线性求解器的片上异构实现","authors":"Arash Sadrieh, Stefano Charissis, A. Hill","doi":"10.1109/IPDPSW.2013.51","DOIUrl":null,"url":null,"abstract":"Inter-device communication is a common limitation of GPGPU computing methods. The on-chip heterogeneous architecture of a recent class of accelerated processing units (APUs), that combine programmable CPU and GPU cores on the same die, presents an opportunity to address this problem. Here we describe an APU-based heterogeneous implementation of the Jacobi-preconditioned conjugate gradient method and identify a set of optimal configurations based on examination of standard matrices. By leveraging the low-latency memory transactions of the APU and exploiting CPU/GPU cohabitation for concurrent vector operations, a comparable performance to that of a high-end GPU running CUSP is achieved. Our results show that use of on-chip heterogeneous architectures can be attractively cost-effective and even show better performance for applications with a low number of linear solver iterations and when device-to-device data transfer is significant. Accordingly, the APU architecture and associated GPAPU methods have significant potential as a low cost, energy efficient alternative for parallel HPC architectures.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"An On-chip Heterogeneous Implementation of a General Sparse Linear Solver\",\"authors\":\"Arash Sadrieh, Stefano Charissis, A. Hill\",\"doi\":\"10.1109/IPDPSW.2013.51\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inter-device communication is a common limitation of GPGPU computing methods. The on-chip heterogeneous architecture of a recent class of accelerated processing units (APUs), that combine programmable CPU and GPU cores on the same die, presents an opportunity to address this problem. Here we describe an APU-based heterogeneous implementation of the Jacobi-preconditioned conjugate gradient method and identify a set of optimal configurations based on examination of standard matrices. By leveraging the low-latency memory transactions of the APU and exploiting CPU/GPU cohabitation for concurrent vector operations, a comparable performance to that of a high-end GPU running CUSP is achieved. Our results show that use of on-chip heterogeneous architectures can be attractively cost-effective and even show better performance for applications with a low number of linear solver iterations and when device-to-device data transfer is significant. Accordingly, the APU architecture and associated GPAPU methods have significant potential as a low cost, energy efficient alternative for parallel HPC architectures.\",\"PeriodicalId\":234552,\"journal\":{\"name\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2013.51\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

设备间通信是GPGPU计算方法的一个常见限制。最近一类加速处理单元(apu)的片上异构架构将可编程CPU和GPU内核结合在同一个芯片上，为解决这一问题提供了机会。在这里，我们描述了基于apu的jacobi预条件共轭梯度方法的异构实现，并根据标准矩阵的检查确定了一组最优配置。通过利用APU的低延迟内存事务和利用CPU/GPU共存并发向量操作，实现了与运行CUSP的高端GPU相当的性能。我们的研究结果表明，使用片上异构架构可以具有吸引力的成本效益，甚至在线性求解器迭代次数较少以及设备到设备数据传输重要的应用程序中表现出更好的性能。因此，APU架构和相关的GPAPU方法作为一种低成本、节能的并行HPC架构替代方案具有巨大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An On-chip Heterogeneous Implementation of a General Sparse Linear Solver

Inter-device communication is a common limitation of GPGPU computing methods. The on-chip heterogeneous architecture of a recent class of accelerated processing units (APUs), that combine programmable CPU and GPU cores on the same die, presents an opportunity to address this problem. Here we describe an APU-based heterogeneous implementation of the Jacobi-preconditioned conjugate gradient method and identify a set of optimal configurations based on examination of standard matrices. By leveraging the low-latency memory transactions of the APU and exploiting CPU/GPU cohabitation for concurrent vector operations, a comparable performance to that of a high-end GPU running CUSP is achieved. Our results show that use of on-chip heterogeneous architectures can be attractively cost-effective and even show better performance for applications with a low number of linear solver iterations and when device-to-device data transfer is significant. Accordingly, the APU architecture and associated GPAPU methods have significant potential as a low cost, energy efficient alternative for parallel HPC architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum

自引率

0.00%

发文量