基于gpu的并行户主双对角化

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2010-06-21 DOI:10.1145/1851476.1851512

Fangbing Liu, F. Seinstra

{"title":"基于gpu的并行户主双对角化","authors":"Fangbing Liu, F. Seinstra","doi":"10.1145/1851476.1851512","DOIUrl":null,"url":null,"abstract":"In this paper, we discuss the GPU-based implementation and optimization of Householder bidiagonalization, a matrix factorization method which is an integral part of full Singular Value Decomposition (SVD) - an important algorithm for many problems in the research domain of Multimedia Content Analysis (MMCA). On cluster computers, complex adaptive run-time techniques often must be implemented to overcome the growing negative performance impact of load imbalances and to ensure reasonable speedup. We show that the nature of the many-core platform can avoid the necessity of applying such complex run-time parallelization techniques in software while achieving a performance of 64 gigaflops/s on a single-GPU GTX 295 in double precision, 82% of the theoretical peak performance.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"GPU-based parallel householder bidiagonalization\",\"authors\":\"Fangbing Liu, F. Seinstra\",\"doi\":\"10.1145/1851476.1851512\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we discuss the GPU-based implementation and optimization of Householder bidiagonalization, a matrix factorization method which is an integral part of full Singular Value Decomposition (SVD) - an important algorithm for many problems in the research domain of Multimedia Content Analysis (MMCA). On cluster computers, complex adaptive run-time techniques often must be implemented to overcome the growing negative performance impact of load imbalances and to ensure reasonable speedup. We show that the nature of the many-core platform can avoid the necessity of applying such complex run-time parallelization techniques in software while achieving a performance of 64 gigaflops/s on a single-GPU GTX 295 in double precision, 82% of the theoretical peak performance.\",\"PeriodicalId\":330072,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1851476.1851512\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1851476.1851512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文讨论了基于gpu的Householder双对角化的实现和优化。Householder双对角化是全奇异值分解(SVD)的一个组成部分，是多媒体内容分析(MMCA)研究领域中许多问题的重要算法。在集群计算机上，通常必须实现复杂的自适应运行时技术，以克服负载不平衡对性能日益增长的负面影响，并确保合理的加速。我们表明，多核平台的性质可以避免在软件中应用这种复杂的运行时并行化技术的必要性，同时在单gpu GTX 295上实现64千兆次/秒的双精度性能，达到理论峰值性能的82%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPU-based parallel householder bidiagonalization

In this paper, we discuss the GPU-based implementation and optimization of Householder bidiagonalization, a matrix factorization method which is an integral part of full Singular Value Decomposition (SVD) - an important algorithm for many problems in the research domain of Multimedia Content Analysis (MMCA). On cluster computers, complex adaptive run-time techniques often must be implemented to overcome the growing negative performance impact of load imbalances and to ensure reasonable speedup. We show that the nature of the many-core platform can avoid the necessity of applying such complex run-time parallelization techniques in software while achieving a performance of 64 gigaflops/s on a single-GPU GTX 295 in double precision, 82% of the theoretical peak performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量