{"title":"容延迟分布稀疏三角解的一种新的数据映射方案","authors":"K. Teranishi, P. Raghavan, E. Ng","doi":"10.1109/SC.2002.10020","DOIUrl":null,"url":null,"abstract":"This paper concerns latency-tolerant schemes for the efficient parallel solution of sparse triangular linear systems on distributed memory multiprocessors. Such triangular solution is required when sparse Cholesky factors are used to solve for a sequence of right-hand-side vectors or when incomplete sparse Cholesky factors are used to precondition a Conjugate Gradients iterative solver. In such applications, the use of traditional distributed substitution schemes can create a performance bottleneck when the latency of interprocessor communication is large. We had earlier developed the Selective Inversion (SI) scheme to reduce communication latency costs by replacing distributed substitution by parallel matrix vector multiplication. We now present a new two-way mapping of the triangular sparse matrix to processors to improve the performance of SI by halving its communication latency costs. We provide analytic results for model sparse matrices and we report on the performance of our scheme for parallel preconditioning with incomplete sparse Cholesky factors.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution\",\"authors\":\"K. Teranishi, P. Raghavan, E. Ng\",\"doi\":\"10.1109/SC.2002.10020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper concerns latency-tolerant schemes for the efficient parallel solution of sparse triangular linear systems on distributed memory multiprocessors. Such triangular solution is required when sparse Cholesky factors are used to solve for a sequence of right-hand-side vectors or when incomplete sparse Cholesky factors are used to precondition a Conjugate Gradients iterative solver. In such applications, the use of traditional distributed substitution schemes can create a performance bottleneck when the latency of interprocessor communication is large. We had earlier developed the Selective Inversion (SI) scheme to reduce communication latency costs by replacing distributed substitution by parallel matrix vector multiplication. We now present a new two-way mapping of the triangular sparse matrix to processors to improve the performance of SI by halving its communication latency costs. We provide analytic results for model sparse matrices and we report on the performance of our scheme for parallel preconditioning with incomplete sparse Cholesky factors.\",\"PeriodicalId\":302800,\"journal\":{\"name\":\"ACM/IEEE SC 2002 Conference (SC'02)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM/IEEE SC 2002 Conference (SC'02)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.2002.10020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2002 Conference (SC'02)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2002.10020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Data-Mapping Scheme for Latency-Tolerant Distributed Sparse Triangular Solution
This paper concerns latency-tolerant schemes for the efficient parallel solution of sparse triangular linear systems on distributed memory multiprocessors. Such triangular solution is required when sparse Cholesky factors are used to solve for a sequence of right-hand-side vectors or when incomplete sparse Cholesky factors are used to precondition a Conjugate Gradients iterative solver. In such applications, the use of traditional distributed substitution schemes can create a performance bottleneck when the latency of interprocessor communication is large. We had earlier developed the Selective Inversion (SI) scheme to reduce communication latency costs by replacing distributed substitution by parallel matrix vector multiplication. We now present a new two-way mapping of the triangular sparse matrix to processors to improve the performance of SI by halving its communication latency costs. We provide analytic results for model sparse matrices and we report on the performance of our scheme for parallel preconditioning with incomplete sparse Cholesky factors.