Int. J. High Speed Comput.最新文献_第5页

Three-Dimensional Monte Carlo Device Simulation with Parallel Multigrid Solver 基于并行多网格求解器的三维蒙特卡罗器件仿真

Int. J. High Speed Comput. Pub Date : 1997-09-01 DOI: 10.1142/S0129053397000143

Can K. Sandalci, Ç. Coç, S. Goodnick

引用次数: 5

Rationale and Strategy for a 21st Century Scientific Computing Architecture: the Case for Using Commercial Symmetric Multiprocessors as Supercomputers 21世纪科学计算体系结构的基本原理和策略:使用商业对称多处理器作为超级计算机的案例

Int. J. High Speed Comput. Pub Date : 1997-09-01 DOI: 10.1142/S0129053397000131

W. Johnston

{"title":"Rationale and Strategy for a 21st Century Scientific Computing Architecture: the Case for Using Commercial Symmetric Multiprocessors as Supercomputers","authors":"W. Johnston","doi":"10.1142/S0129053397000131","DOIUrl":"https://doi.org/10.1142/S0129053397000131","url":null,"abstract":"In this paper we argue that the next generation of supercomputers will be based on tight-knit clusters of symmetric multiprocessor systems in order to: (i) provide higher capacity at lower cost; (ii) enable easy future expansion, and (iii) ease the development of computational science applications. This strategy involves recognizing that the current vector supercomputer user community divides (roughly) into two groups, each of which will benefit from this approach: One, the \"capacity\" users (who tend to run production codes aimed at solving the science problems of today) will get better throughput than they do today by moving to large symmetric multiprocessor systems (SMPs), and a second group, the \"capability\" users (who tend to be developing new computational science techniques) will invest the time needed to get high performance from cluster-based parallel systems. In addition to the technology-based arguments for the strategy, we believe that it also supports a vision for a revitalization of scientific computing. This vision is that an architecture based on commodity components and computer science innovation will: (i) enable very scalable high performance computing to address the high-end computational science requirements; (ii) provide better throughput and a more productive code development environment for production supercomputing; (iii) provide a path to integration with the laboratory and experimental sciences, and (iv) be the basis of an on-going collaboration between the scientific community, the computing industry, and the research computer science community in order to provide a computing environment compatible with production codes and dynamically increasing in both hardware and software capability and capacity. We put forward the thesis that the current level of hardware performance and sophistication of the software environment found in commercial symmetric multiprocessor (SMP) systems, together with advances in distributed systems architectures, make clusters of SMPs one of the highest-performance, most cost-effective approaches to computing available today. The current capacity users of the C90-like system will be served in such an environment by having more of several critical resources than the current environment provides: much more CPU time per unit of real time, larger memory per node and much larger memory per cluster; and the capability users are served by an MPP-like performance and an architecture that enables continuous growth into the future. In addition to these primary arguments, secondary advantages of SMP clusters include: the ability to replicate this sort of system in smaller units to provide identical computing environments at the home sites and laboratories of scientific users; the future potential for using the global Internet for interconnecting large clusters at a central facility with smaller clusters at other sites to form a very high capability system; and a rapidly growing base of supporting commercial","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115536523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Parallel Solution of Dense Linear Systems on the k-Ary n-Cube Networks k-Ary n-Cube网络上密集线性系统的并行解

Int. J. High Speed Comput. Pub Date : 1997-06-01 DOI: 10.1142/S0129053397000088

A. Al-Ayyoub, K. Day

引用次数: 5

Mapping Tridiagonal System Algorithms onto Mesh Connected Computers 将三对角线系统算法映射到网状连接计算机上

Int. J. High Speed Comput. Pub Date : 1997-06-01 DOI: 10.1142/S012905339700009X

M. Amor, Juan López, Francisco Argüello, E. Zapata

引用次数: 2

Locality Optimizations for Parallel Computing Using Data Access Information 基于数据访问信息的并行计算局部性优化

Int. J. High Speed Comput. Pub Date : 1997-06-01 DOI: 10.1142/S0129053397000118

M. Rinard

{"title":"Locality Optimizations for Parallel Computing Using Data Access Information","authors":"M. Rinard","doi":"10.1142/S0129053397000118","DOIUrl":"https://doi.org/10.1142/S0129053397000118","url":null,"abstract":"Given the large communication overheads characteristic of modern parallel machines, optimizations that improve locality by executing tasks close to data that they will access may improve the performance of parallel computations. This paper describes our experience automatically applying locality optimizations in the context of Jade, a portable, implicitly parallel programming language designed for exploiting task-level concurrency. Jade programmers start with a program written in a standard serial, imperative language, then use Jade constructs to declare how parts of the program access data. The Jade implementation uses this data access information to automatically extract the concurrency and apply locality optimizations. We present performance results for several Jade applications running on the Stanford DASH machine. We use these results to characterize the overall performance impact of the locality optimizations. In our application set the locality optimization level has little effect on the performance of two of the applications and a large effect on the performance of the rest of the applications. We also found that, if the locality optimization level had a significant effect on the performance, the maximum performance was obtained when the programmer explicitly placed tasks on processors rather than relying on the scheduling algorithm inside the Jade implementation.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130555943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Coordination of Distributed and Parallel Activities in the IWIM Model IWIM模型中分布式和并行活动的协调

Int. J. High Speed Comput. Pub Date : 1997-06-01 DOI: 10.1142/S0129053397000106

G. A. Papadopoulos, F. Arbab

{"title":"Coordination of Distributed and Parallel Activities in the IWIM Model","authors":"G. A. Papadopoulos, F. Arbab","doi":"10.1142/S0129053397000106","DOIUrl":"https://doi.org/10.1142/S0129053397000106","url":null,"abstract":"We present an alternative way of designing new as well as using existing coordination models for parallel and distributed environments. This approach is based on a complete symmetry between and decoupling of producers and consumers, as well as a clear distinction between the computation and the coordination/ communication work performed by each process. The novel ideas are: (i) to allow both producer and consumer processes to communicate with each other in a fashion that does not dictate any one of them to have specific knowledge about the rest of the processes involved in a coordinated activity, and (ii) to introduce control or state driven changes (as opposed to the data-driven changes usually employed) to the current state of a computation. Although a direct realisation of this model in terms of a concrete coordination language does exist, we argue that the underlying principles can be applied to other similar models. We demonstrate our point by showing how the functionality of the proposed model can be realised in a general coordination framework, namely the Shared Dataspace one, using as driving force the Linda-based formalism. Our demonstration achieves the following objectives: (i) yields an alternative (control- rather than data-driven) Linda-based coordination framework, and (ii) does it in such a way that the proposed apparatus can be used for other Shared-Dataspace-like coordination formalisms with little modification.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"56 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117216825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

A New Bidirectional Cholesky Factorization Algorithm for Parallel Solution of Sparse Symmetric Positive Definite Systems 稀疏对称正定系统并行解的一种新的双向Cholesky分解算法

Int. J. High Speed Comput. Pub Date : 1997-03-01 DOI: 10.1142/S0129053397000064

K. Murthy, C. Murthy

引用次数: 0

Implementation of ART1 and ART2 Artificial Neural Networks on Ring and Mesh Architectures ART1和ART2人工神经网络在环形和网状结构上的实现

Int. J. High Speed Comput. Pub Date : 1997-03-01 DOI: 10.1142/S0129053397000052

G. D. Ghare, L. Patnaik

引用次数: 2

A Cost Optimal Search Technique for the Knapsack Problem 背包问题的成本最优搜索技术

Int. J. High Speed Comput. Pub Date : 1997-03-01 DOI: 10.1142/S0129053397000027

D. Lou, Chinchen Chang

{"title":"A Cost Optimal Search Technique for the Knapsack Problem","authors":"D. Lou, Chinchen Chang","doi":"10.1142/S0129053397000027","DOIUrl":"https://doi.org/10.1142/S0129053397000027","url":null,"abstract":"The knapsack problem is known to be a typical NP-complete problem, which has 2n possible solutions to search over. Thus a task for solving the knapsack problem can be accomplished in 2n trials if an exhaustive search is applied. In the past decade, much effort has been devoted in order to reduce the computation time of this problem instead of exhaustive search. In 1984, Karnin proposed a brilliant parallel algorithm, which needs O(2n/6) processors to solve the knapsack problem in O(2n/2) time; that is, the cost of Karnin's parallel algorithm is O(22n/3). In this paper, we propose a fast search technique to improve Karnin's parallel algorithm by reducing the search time complexity of Karnin's parallel algorithm to be O(2n/3) under the same O(2n/6) processors available. Thus, the cost of the proposed parallel algorithm is O(2n/2). Furthermore, we extend this search technique to the case that the number of available processors is P = O(2x), where x ≥ 1. From the analytical results, we see that our search technique is indeed superior to the previously proposed methods. We do believe our proposed parallel algorithm is pragmatically feasible at the moment when multiprocessor systems become more and more popular.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134048959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Fast Parallel Radix Sort Using a Reconfigurable Mesh 使用可重构网格的快速并行基数排序

Int. J. High Speed Comput. Pub Date : 1997-03-01 DOI: 10.1142/S0129053397000040

Ju-wook Jang, Kyung-Geun Lee

引用次数: 1