基于多gpu的可扩展千兆级数据驱动的Cholesky分解方法

ESPM '15 Pub Date : 2015-11-15 DOI:10.1145/2832241.2832245

Yuki Tsujita, Toshio Endo, K. Fujisawa

{"title":"基于多gpu的可扩展千兆级数据驱动的Cholesky分解方法","authors":"Yuki Tsujita, Toshio Endo, K. Fujisawa","doi":"10.1145/2832241.2832245","DOIUrl":null,"url":null,"abstract":"The Cholesky factorization is an important linear algebra kernel which is used in the semidefinite programming (SDP) problem. However, the large computation costs for Cholesky factorization of the Schur complement matrix (SCM) has been obstacles to solve large scale problems. This paper describes a brand-new version of the parallel SDP solver, SDPARA, which has been equipped with a Cholesky factorization implementation and demonstrated 1.7PFlops performance with over two million constraints by using 4,080 GPUs. The performance and scalability is even more improved by introducing a data-driven approach, rather than traditional synchronous approach. Also we point out that typical data-driven implementations have limitation in scalability, and demonstrate the efficiency of the proposed approach via experiments on TSUBAME2.5 supercomputer.","PeriodicalId":347945,"journal":{"name":"ESPM '15","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs\",\"authors\":\"Yuki Tsujita, Toshio Endo, K. Fujisawa\",\"doi\":\"10.1145/2832241.2832245\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Cholesky factorization is an important linear algebra kernel which is used in the semidefinite programming (SDP) problem. However, the large computation costs for Cholesky factorization of the Schur complement matrix (SCM) has been obstacles to solve large scale problems. This paper describes a brand-new version of the parallel SDP solver, SDPARA, which has been equipped with a Cholesky factorization implementation and demonstrated 1.7PFlops performance with over two million constraints by using 4,080 GPUs. The performance and scalability is even more improved by introducing a data-driven approach, rather than traditional synchronous approach. Also we point out that typical data-driven implementations have limitation in scalability, and demonstrate the efficiency of the proposed approach via experiments on TSUBAME2.5 supercomputer.\",\"PeriodicalId\":347945,\"journal\":{\"name\":\"ESPM '15\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ESPM '15\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2832241.2832245\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ESPM '15","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2832241.2832245","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Cholesky分解是求解半定规划(SDP)问题的重要线性代数核。然而，舒尔补矩阵(SCM)的Cholesky分解计算成本大，一直是解决大规模问题的障碍。本文介绍了一种全新版本的并行SDP求解器SDPARA，它配备了Cholesky分解实现，并通过使用4,080个gpu在超过200万个约束下展示了1.7PFlops的性能。通过引入数据驱动的方法而不是传统的同步方法，性能和可伸缩性得到了更大的改进。同时指出了典型的数据驱动实现在可扩展性上的局限性，并通过在TSUBAME2.5超级计算机上的实验验证了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The scalable petascale data-driven approach for the Cholesky factorization with multiple GPUs

The Cholesky factorization is an important linear algebra kernel which is used in the semidefinite programming (SDP) problem. However, the large computation costs for Cholesky factorization of the Schur complement matrix (SCM) has been obstacles to solve large scale problems. This paper describes a brand-new version of the parallel SDP solver, SDPARA, which has been equipped with a Cholesky factorization implementation and demonstrated 1.7PFlops performance with over two million constraints by using 4,080 GPUs. The performance and scalability is even more improved by introducing a data-driven approach, rather than traditional synchronous approach. Also we point out that typical data-driven implementations have limitation in scalability, and demonstrate the efficiency of the proposed approach via experiments on TSUBAME2.5 supercomputer.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ESPM '15

自引率

0.00%

发文量