Parallel Process. Lett.最新文献_第7页

A Note on the Steiner k-Diameter of Tensor Product Networks 关于张量积网络的Steiner k-直径的一个注记

Parallel Process. Lett. Pub Date : 2019-06-01 DOI: 10.1142/S0129626419500087

Pranav Arunandhi, E. Cheng, Christopher Melekian

引用次数: 1

The Generalized Connectivity of Data Center Networks 数据中心网络的广义连通性

Parallel Process. Lett. Pub Date : 2019-06-01 DOI: 10.1142/S0129626419500075

Chen Hao, Weihua Yang

引用次数: 5

Round Robin Thread Selection Optimization in Multithreaded Processors 多线程处理器中的轮循线程选择优化

Parallel Process. Lett. Pub Date : 2019-05-10 DOI: 10.1142/S0129626419500038

Shane Carroll, Wei-Ming Lin

引用次数: 0

Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs gpu集群上高效的代数多网格预处理

Parallel Process. Lett. Pub Date : 2019-05-10 DOI: 10.1142/S0129626419500014

A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone

{"title":"Efficient Algebraic Multigrid Preconditioners on Clusters of GPUs","authors":"A. A. Hassan, V. Cardellini, P. D'Ambra, D. Serafino, S. Filippone","doi":"10.1142/S0129626419500014","DOIUrl":"https://doi.org/10.1142/S0129626419500014","url":null,"abstract":"Many scientific applications require the solution of large and sparse linear systems of equations using Krylov subspace methods; in this case, the choice of an effective preconditioner may be crucial for the convergence of the Krylov solver. Algebraic MultiGrid (AMG) methods are widely used as preconditioners, because of their optimal computational cost and their algorithmic scalability. The wide availability of GPUs, now found in many of the fastest supercomputers, poses the problem of implementing efficiently these methods on high-throughput processors. In this work we focus on the application phase of AMG preconditioners, and in particular on the choice and implementation of smoothers and coarsest-level solvers capable of exploiting the computational power of clusters of GPUs. We consider block-Jacobi smoothers using sparse approximate inverses in the solve phase associated with the local blocks. The choice of approximate inverses instead of sparse matrix factorizations is driven by the large amount of parallelism exposed by the matrix-vector product as compared to the solution of large triangular systems on GPUs. The selected smoothers and solvers are implemented within the AMG preconditioning framework provided by the MLD2P4 library, using suitable sparse matrix data structures from the PSBLAS library. Their behaviour is illustrated in terms of execution speed and scalability, on a test case concerning groundwater modelling, provided by the Jülich Supercomputing Center within the Horizon 2020 Project EoCoE.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"93 Suppl 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128836650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Efficient Communication Induced Checkpointing Protocol for Broadcast Network-based Distributed Systems 基于广播网络的分布式系统的高效通信诱导检查点协议

Parallel Process. Lett. Pub Date : 2019-05-10 DOI: 10.1142/S012962641950004X

Jinho Ahn

引用次数: 2

Implementing ♢P with Bounded Messages on a Network of ADD Channels 在ADD通道网络上实现有界消息的招收P

Parallel Process. Lett. Pub Date : 2019-05-10 DOI: 10.1142/S0129626419500026

Saptaparni Kumar, J. Welch

引用次数: 6

Optimizing Data Intensive Flows for Networks on Chips 优化芯片上网络的数据密集型流

Parallel Process. Lett. Pub Date : 2018-12-18 DOI: 10.1142/S0129626421500134

Junwei Zhang, Yang Liu, Shi Li, T. Robertazzi

引用次数: 3

Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution 基于共轭梯度粗网格解的多网格求解器可重构硬件生成

Parallel Process. Lett. Pub Date : 2018-12-01 DOI: 10.1142/S0129626418500160

Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig

{"title":"Reconfigurable Hardware Generation of Multigrid Solvers with Conjugate Gradient Coarse-Grid Solution","authors":"Christian Schmitt, Moritz Schmid, S. Kuckuk, H. Köstler, Jürgen Teich, Frank Hannig","doi":"10.1142/S0129626418500160","DOIUrl":"https://doi.org/10.1142/S0129626418500160","url":null,"abstract":"Not only in the field of high-performance computing (HPC), field programmable gate arrays (FPGAs) are a soaringly popular accelerator technology. However, they use a completely different programming paradigm and tool set compared to central processing units (CPUs) or even graphics processing units (GPUs), adding extra development steps and requiring special knowledge, hindering widespread use in scientific computing. To bridge this programmability gap, domain-specific languages (DSLs) are a popular choice to generate low-level implementations from an abstract algorithm description. In this work, we demonstrate our approach for the generation of numerical solver implementations based on the multigrid method for FPGAs from the same code base that is also used to generate code for CPUs using a hybrid parallelization of MPI and OpenMP. Our approach yields in a hardware design that can compute up to 11 V-cycles per second with an input grid size of 4096[Formula: see text]4096 and solution on the coarsest using the conjugate gradient (CG) method on a mid-range FPGA, beating vectorized, multi-threaded execution on an Intel Xeon processor.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Regular Connected Bipancyclic Spanning Subgraphs of Torus Networks 环面网络的正则连通双环生成子图

Parallel Process. Lett. Pub Date : 2018-12-01 DOI: 10.1142/S0129626418500135

M. Lu, Shurong Zhang, Weihua Yang

引用次数: 1

Fractional Matching Preclusion for (n, k)-Star Graphs (n, k)-星图的分数匹配排除

Parallel Process. Lett. Pub Date : 2018-12-01 DOI: 10.1142/S0129626418500172

Tianlong Ma, Y. Mao, E. Cheng, Jinling Wang

引用次数: 7