Parallel Algorithms and Applications最新文献_第4页

Defects in parallel Monte Carlo and quasi-Monte Carlo integration using the leap-frog technique 利用跃迁技术求解平行蒙特卡罗积分和拟蒙特卡罗积分中的缺陷

Parallel Algorithms and Applications Pub Date : 2003-05-01 DOI: 10.1080/1063719031000088021

K. Entacher, Thomas Schell, W. C. Schmid, A. Uhl

引用次数: 15

Special Issue: A systolic block-Jacobi SVD solver for processor meshes 特刊:处理器网格的收缩块- jacobi SVD求解器

Parallel Algorithms and Applications Pub Date : 2003-05-01 DOI: 10.1080/1063719031000088003

G. Okša, M. Vajtersic

{"title":"Special Issue: A systolic block-Jacobi SVD solver for processor meshes","authors":"G. Okša, M. Vajtersic","doi":"10.1080/1063719031000088003","DOIUrl":"https://doi.org/10.1080/1063719031000088003","url":null,"abstract":"We design the systolic version of the two-sided block-Jacobi algorithm for the singular value decomposition (SVD) of matrix A∈R m×n , and m, n even. The algorithm involves the class CO of parallel orderings on the two-dimensional toroidal mesh with p processors. The mathematical background is based on the QR decomposition (QRD) of local data matrices and on the triangular Kogbetliantz algorithm (TKA) for local SVDs in the diagonal mesh processors. Subsequent updates of local matrices in the diagonal as well as nondiagonal mesh processors are required. We show that all updates can be realized by orthogonal modified Givens rotations. These rotations can be efficiently pipelined in parallel in the horizontal and vertical rings of processor through the toroidal mesh. Our solution requires, per one mesh processor, systolic processing elements (PEs) and additional delay elements. The time complexity can be estimated as where w is the number of global sweeps in the two-sided block-Jacobi algorithm and Δ is the length of the global synchronization time step. The VLSI area per mesh processor, measured by the number of vertical and horizontal wires required for its construction, can be estimated as and the combined VLSI area–time complexity per mesh processor is The theoretical speedup can be estimated as Using the mesh processors of fixed inner size , even, it is possible to construct the square two-dimensional toroidal mesh and to compute the SVD of matrix A, the size of the which matches the shape of mesh processors, i.e. In this sense, the systolic algorithm is scalable.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125612576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

LEFTMOST EIGENVALUE OF REAL AND COMPLEX SPARSE MATRICES ON PARALLEL COMPUTER USING APPROXIMATE INVERSE PRECONDITIONING 利用近似逆预处理在并行计算机上求实和复稀疏矩阵的最左特征值

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/10637190208941433

G. Pini

引用次数: 1

AN O∥LOG P) PARALLEL IMPLEMENTATION OF FEEDBACK GUIDED DYNAMIC LOOP SCHEDULING 一个o∥log p)并行实现的反馈导向动态循环调度

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/10637190208941438

T. Tabirca, Len Freeman, S. Tabirca

引用次数: 5

NUMERICAL SOLUTION OF DISCRETE STABLE LINEAR MATRIX EQUATIONS ON MULTICOMPUTERS 离散稳定线性矩阵方程在多计算机上的数值解

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/10637190208941436

P. Benner, E. S. Quintana‐Ortí, G. Quintana-Ortí

引用次数: 71

PORTING REGULAR APPLICATIONS ON HETEROGENEOUS WORKSTATION NETWORKS: PERFORMANCE ANALYSIS AND MODELING 在异构工作站网络上移植常规应用程序:性能分析和建模

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/01495730108941441

A. Clematis, A. Corana

{"title":"PORTING REGULAR APPLICATIONS ON HETEROGENEOUS WORKSTATION NETWORKS: PERFORMANCE ANALYSIS AND MODELING","authors":"A. Clematis, A. Corana","doi":"10.1080/01495730108941441","DOIUrl":"https://doi.org/10.1080/01495730108941441","url":null,"abstract":"Abstract Heterogeneous networks of workstations and/or personal computers (NOW) are increasingly used as a powerful platform for the execution of parallel applications. When applications previously developed for traditional parallel machines (homogeneous and dedicated) are ported to NOWs, performance worsens owing in part to less efficient communications but more often to unbalancing. In this paper, we address the problem of the efficient porting to heterogeneous NOWs of data-parallel applications originally developed using the SPMD paradigm for homogeneous parallel systems with regular topology like ring. To achieve good performance, the computation time on the various machines composing the NOW must be as balanced as possible. This can be obtained in two ways: by using an heterogeneous data partition strategy with a single process per node, or by splitting homogeneously data among processes and assigning to each node a number of processes proportional to its computing power. The first method is however more difficult, since some modifications in the code are always needed, whereas the second approach requires very few changes. We carry out a simplified but reliable analysis, and propose a simple model able to simulate performance in the various situations. Two test cases, matrix multiplication and computation of long-range interactions, are considered, obtaining a good agreement between simulated and experimental results. Our analysis shows that an efficient porting of regular homogeneous data-parallel applications on heterogeneous NOWs is possible. Particularly, the approach based on multiple processes per node turns out to be a straightforward and effective way for achieving very satisfying performance in almost all situations, even dealing with highly heterogeneous systems.","PeriodicalId":406098,"journal":{"name":"Parallel Algorithms and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127652873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

DERIVING A FAST SYSTOLIC ALGORITHM FOR THE LONGEST COMMON SUBSEQUENCE PROBLEM 给出了一种求解最长公共子序列问题的快速收敛算法

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/10637190208941431

Yen-Chun Lin, J. Yeh

引用次数: 5

A PARALLEL DIVIDE AND CONQUER ALGORITHM FOR NON SYMMETRIC TRIDIAGONAL TOEPLITZ SYSTEMS USING CONJUGATE GRADIENT 非对称三对角线toeplitz系统的共轭梯度并行分治算法

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/01495730208941443

L. Garey, R. E. Shaw, J. Zhang

引用次数: 1

THE LOAD DISTRIBUTION PROBLEM IN A PROCESSOR RING 处理器环中的负载分配问题

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/01495730108941440

F. Lau

引用次数: 2

ON MAX CUT IN CUBIC GRAPHS 关于三次图的Max切割

Parallel Algorithms and Applications Pub Date : 2002-01-01 DOI: 10.1080/01495730108941439

T. Calamoneri, Irene Finocchi, Y. Manoussakis, R. Petreschi

引用次数: 1