Int. J. High Speed Comput.最新文献

筛选
英文 中文
An Approximate Agreement Algorithm for Wraparound Meshes 环绕网格的近似一致算法
Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000221
R. Cheng, C. Chung
{"title":"An Approximate Agreement Algorithm for Wraparound Meshes","authors":"R. Cheng, C. Chung","doi":"10.1142/S0129053395000221","DOIUrl":"https://doi.org/10.1142/S0129053395000221","url":null,"abstract":"An appropriate algorithm, the neighboring exchange, for reaching an approximate agreement in a wraparound mesh is proposed. The algorithm is characterized by its isotropic nature, which is of particular usefulness when applied in any symmetric system. The behavior of this algorithm can be depicted by recurrence relations which can be used to derive the convergence rate. The convergence rate is meaningful when the algorithm is used to synchnize clocks. The rate of synchronizing clocks is derived, and it can be applied to all wraparound meshes with practical scale. With the recurrence relations, we also prove the correctness of this algorithm.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117023150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel Matrix Multiplication Algorithms on Hypercube Multiprocessors 超立方体多处理器上的并行矩阵乘法算法
Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S012905339500021X
Peizong Lee
{"title":"Parallel Matrix Multiplication Algorithms on Hypercube Multiprocessors","authors":"Peizong Lee","doi":"10.1142/S012905339500021X","DOIUrl":"https://doi.org/10.1142/S012905339500021X","url":null,"abstract":"In this paper, we present three parallel algorithms for matrix multiplication. The first one, which employs pipelining techniques on a mesh grid, uses only one copy of data matrices. The second one uses multiple copies of data matrices also on a mesh grid. Although data communication operations of the second algorithm are reduced, the requirement of local data memory for each processing element increases. The third one, which uses a cubic grid, shows the trade-offs between reducing the computation time and reducing the communication overhead. Performance models and feasibilities of these three algorithms are studied. We analyze the interplay among the numbers of processing elements, the communication overhead, and the requirements of local memory in each processing element. We also present experimental results of these three algorithms on a 32-node nCUBE-2 computer.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130131670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multithreaded Decoupled Architecture 多线程解耦架构
Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000257
M. Dorojevets, V. Oklobdzija
{"title":"Multithreaded Decoupled Architecture","authors":"M. Dorojevets, V. Oklobdzija","doi":"10.1142/S0129053395000257","DOIUrl":"https://doi.org/10.1142/S0129053395000257","url":null,"abstract":"A new computer architecture called the Multithreaded Decoupled Architecture has been proposed for exploiting fine-grain parallelism. It develops further some of the ideas of parallel processing implemented in the Russian MARS-M computer in the 1980s. The MTD architecture aims at enhancing both total machine throughput and a single thread performance. To achieve this goal, we propose a two-level parallel computation model. Its low level defines the decoupled parallel execution of instructions within program fragments not containing branches. We will be referring to these fragments as basic blocks. The model’s high level defines the parallel execution of multiple basic blocks representing a function or procedure. This scheduling hierarchy reflects the MTD storage hierarchy. Together the scheduling and storage models allow a processor with multiple execution units to exploit several forms of parallelism within a procedure. The compiler provides the hardware with thread register usage masks to allow run-time enforcing of control and data dependencies between the high level threads. We present a possible implementation of the MTD-processor with multiple execution units and two-level distributed register memory.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116828787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
On Optimal Weighted Binary Trees 关于最优加权二叉树
Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000245
J. Pradhan, C. V. Sastry
{"title":"On Optimal Weighted Binary Trees","authors":"J. Pradhan, C. V. Sastry","doi":"10.1142/S0129053395000245","DOIUrl":"https://doi.org/10.1142/S0129053395000245","url":null,"abstract":"A new recursive top-down algorithm for the construction of a unique Huffman tree is introduced. We show that the prefix codes generated from the Huffman tree are unique and the weighted path length is optimal. Initially we have not imposed any restriction on the maximum length (the number of bits) a prefix code can take. But if buffering of the source is required, we have to put a restriction on the length of the prefix code. In this context we extend the top-down recursive algorithm for generating length-limited prefix codes.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123319503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Benchmarking Fortran Intrinsic Functions Fortran固有函数的基准测试
Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000129
Toru Nagai
{"title":"Benchmarking Fortran Intrinsic Functions","authors":"Toru Nagai","doi":"10.1142/S0129053395000129","DOIUrl":"https://doi.org/10.1142/S0129053395000129","url":null,"abstract":"High performance of mathematical functions is essential to speed up scientific calculations because they are very frequently used in scientific computing. This paper presents performance of important Fortran intrinsic functions on the fastest vector supercomputers. It is assumed that a relationship between CPU-time and the number of function arguments given to calculate function values is linear, and speeds of a function were measured using the parameters and . The author also examines how the speed of the function varies with respect to the selection of arguments. The computers tested in the present paper are Cray C9016E/16256– 4, Fujitsu VP2600/10, Hitachi S-3800/480 and NEC SX-3/14R.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123460574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Block Preconditioned Conjugate Gradient Methods on a Distributed Virtual Shared Memory Multiprocessor 分布式虚拟共享内存多处理器的块预条件共轭梯度方法
Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000105
L. Giraud
{"title":"Block Preconditioned Conjugate Gradient Methods on a Distributed Virtual Shared Memory Multiprocessor","authors":"L. Giraud","doi":"10.1142/S0129053395000105","DOIUrl":"https://doi.org/10.1142/S0129053395000105","url":null,"abstract":"We study both shared and distributed approaches for the parallel implementation of the SSOR and Jacobi block preconditioned Krylov methods on a distributed virtual shared memory computer: a BBN TC2000. We consider the solution of block tridiagonal systems arising from the discretization of 3D partial differential equations, which diagonal blocks correspond to the discretization of 2D partial differential equations. The solution of the diagonal subproblems required for the preconditionings are performed using a domain decomposition method with overlapped subdomains: a variant of the Schwarz alternating method.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124538696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A General-Purpose Parallel Sorting Algorithm 一种通用并行排序算法
Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000166
A. Tridgell, R. Brent
{"title":"A General-Purpose Parallel Sorting Algorithm","authors":"A. Tridgell, R. Brent","doi":"10.1142/S0129053395000166","DOIUrl":"https://doi.org/10.1142/S0129053395000166","url":null,"abstract":"A parallel sorting algorithm is presented for general purpose internal sorting on MIMD machines. The algorithm initially sorts the elements within each node using a serial sorting algorithm, then proceeds with a two-phase parallel merge. The algorithm is comparison-based and requires additional storage of order the square root of the number of elements in each node. Performance of the algorithm on the Fujitsu AP1000 MIMD supercomputer is discussed.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"430 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123573903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Factorized Sparse Approximate Inverse Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers 因式稀疏近似逆预处理II:大规模并行计算机上三维有限元系统的解
Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000117
L. Kolotilina, A. Yeremin
{"title":"Factorized Sparse Approximate Inverse Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers","authors":"L. Kolotilina, A. Yeremin","doi":"10.1142/S0129053395000117","DOIUrl":"https://doi.org/10.1142/S0129053395000117","url":null,"abstract":"An iterative method for solving large linear systems with sparse symmetric positive definite matrices on massively parallel computers is suggested. The method is based on the Factorized Sparse Approximate Inverse (FSAI) preconditioning of ‘parallel’ CG iterations. Efficiency of a concurrent implementation of the FSAI-CG iterations is analyzed for a model hypercube, and an estimate of the optimal hypercube dimension is derived. For finite element applications, two strategies for selecting the preconditioner sparsity pattern are suggested. A high convergence rate of the resulting iterations is demonstrated numerically for the 3D equilibrium equations for linear elastic orthotropic materials approximated using both h- and p-versions of the FEM.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125318516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 66
Extensions to Cycle Shrinking 循环收缩的扩展
Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000154
A. Sethi, S. Biswas, A. Sanyal
{"title":"Extensions to Cycle Shrinking","authors":"A. Sethi, S. Biswas, A. Sanyal","doi":"10.1142/S0129053395000154","DOIUrl":"https://doi.org/10.1142/S0129053395000154","url":null,"abstract":"An important part of a parallelizing compiler is the restructuring phase, which extracts parallelism from a sequential program. We consider an important restructuring transformation called cycle shrinking [5], which partitions the iteration space of a loop so that the iterations within each group of the partition can be executed in parallel. The method in [5] mainly deals with dependences with constant distances. In this paper, we propose certain extensions to the cycle shrinking transformation. For dependences with constant distances, we present an algorithm which, under certain fairly general conditions, partitions the iteration space in a minimal number of groups. Under such conditions, our method is optimal while the previous methods are not. We have also proposed an algorithm to handle a large class of loops which have dependences with variable distances. This problem is considerably harder and has not been considered before in full generality.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128422085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Task Distribution on a Butterfly Multiprocessor 蝴蝶多处理器上的任务分配
Int. J. High Speed Comput. Pub Date : 1995-03-01 DOI: 10.1142/S0129053395000026
I. Gottlieb, A. Herold
{"title":"Task Distribution on a Butterfly Multiprocessor","authors":"I. Gottlieb, A. Herold","doi":"10.1142/S0129053395000026","DOIUrl":"https://doi.org/10.1142/S0129053395000026","url":null,"abstract":"We consider the practical performance of dynamic task distribution on a multiprocessor, where overloaded processors dispense tasks to be performed on idle ones which are free to execute them. We propose a topology and an algorithm for routing packets in a network from an arbitrary subset of processors S to an arbitrary subset T, where the exact target node within T for a particular task is unimportant and therefore not specified. The method presented achieves work distribution in O(10* log N) time, where N is the nodes (processors) number. It operates on a Duplex Butterfly, and requires O(log N) size buffers. The solution is dynamic, taking into consideration real time availability of processors, and deterministic. The mechanism includes throttling of the task generation rate. “Software synchronization” in asynchronous mode ensures the insensitivity of the algorithm to hardware propagation delays of signals in large networks.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"278 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125849815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信