Int. J. High Speed Comput.最新文献

Int. J. High Speed Comput. Pub Date : 2004-06-01 DOI: 10.1142/S0129053304000207

P. Swarztrauber

引用次数: 0

Time-Parallel Computation of Pseudo-Adjoints for a Leapfrog Scheme 跳越方案伪伴随的时间并行计算

Int. J. High Speed Comput. Pub Date : 2004-06-01 DOI: 10.1142/S0129053304000219

C. Bischof

引用次数: 9

Comparisons of the Parallel Preconditioners for Large Nonsymmetric Sparse Linear Systems on a Parallel Computer 大型非对称稀疏线性系统并行预调节器在并行计算机上的比较

Int. J. High Speed Comput. Pub Date : 2004-06-01 DOI: 10.1142/S0129053304000232

Sangback Ma

{"title":"Comparisons of the Parallel Preconditioners for Large Nonsymmetric Sparse Linear Systems on a Parallel Computer","authors":"Sangback Ma","doi":"10.1142/S0129053304000232","DOIUrl":"https://doi.org/10.1142/S0129053304000232","url":null,"abstract":"In this paper we compare various parallel preconditioners for solving large sparse nonsymmetric linear systems. They are Block Jacobi, Point-SSOR, ILU(0) in the wavefront order, ILU(0) in the multi-color order, SPAI(SParse Approximate Inverse), and Multi-Color Block SOR. The Block Jacobi and Point-SSOR are well-known, and ILU(0) is one of the most popular preconditioners, but it is inherently serial. ILU(0) in the wavefront order maximizes the parallelism, and ILU(0) in the multi-color order achieves the parallelism of order (N), where N is the order of the matrix. The SPAI tries to capture the approximate inverse in sparse form, which, then, is expected to be a scalable preconditioner. Finally, we implemented the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver. For the Laplacian matrix the SOR method is known to have a non-deteriorating rate of convergence when used with Multi-Color ordering. Since most of the time is spent on the diagonal inversion, which is done on each processor, we expect it to be a good scalable preconditioner. Finally, due to the blocking effect, it will be effective for ill-conditioned problems. Experiments were conducted for the Finite Difference discretizations of two problems with various meshsizes varying up to 1024×1024, and for an ill-conditioned matrix from the shell problem from the Harwell–Boeing collection. CRAY-T3E with 128 nodes was used. MPI library was used for interprocess communications. The results show that Multi-Color Block SOR and ILU(0) with Multi-Color ordering give the best performances for the finite difference matrices and for the shell problem only the Multi-Color Block SOR and Block Jacobi converges. Based on this we recommend that the Multi-Color Block SOR is the most robust preconditioner out of the preconditioners considered.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124822979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A GA Based Multiple Task Allocation Considering Load 基于遗传算法的考虑负载的多任务分配

Int. J. High Speed Comput. Pub Date : 2000-12-01 DOI: 10.1142/S0129053300000187

A. Tripathi, B. K. Sarker, Naveen Kumar, D. P. Vidyarthi

引用次数: 33

Enchanced Linked-Based Cache Coherence Protocols with a Hardware Mechanism to Reduce the Migratory Sharing Overhead 基于硬件机制的增强链接缓存一致性协议以减少迁移共享开销

Int. J. High Speed Comput. Pub Date : 2000-12-01 DOI: 10.1142/S0129053300000163

Der-Lin Pean, Cheng Chen

{"title":"Enchanced Linked-Based Cache Coherence Protocols with a Hardware Mechanism to Reduce the Migratory Sharing Overhead","authors":"Der-Lin Pean, Cheng Chen","doi":"10.1142/S0129053300000163","DOIUrl":"https://doi.org/10.1142/S0129053300000163","url":null,"abstract":"The linked-based cache coherence protocols, such as the IEEE Scalable Coherence Interface (SCI), have been widely implemented in current highly scalable multiprocessor systems. Thus, we propose several enhanced linked-based cache coherence protocols in multiprocessor systems to evaluate their performance. However, migratory sharing data references in the linked-based systems still incur many cache misses that can be reduced by merging the invalidation/update requests and the cache misses. Research has been devoted to optimizing the migratory sharing references for the centralized directory coherence protocols, but their mechanisms cannot support the linked-based cache coherence protocols. This paper presents enhanced SCI protocols with an effective hardware technique to reduce the overhead of migratory sharing references for the linked-based cache coherence protocols. It reduces cost by eliminating some of the unnecessary supporting mechanisms in centralized directory protocols. The simulation results in SPLASH benchmarks show that our hardware methods enhanced the system performance by up to an average of 10%, by reducing the overhead of the migratory sharing references. The extra benefit of our mechanism is the elimination of the false sharing overhead by degrading a block to shared mode again.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"32 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125709723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Adaptive Fault-Tolerant Wormhole Routing Algorithm for Hypercubes 一种超立方体自适应容错虫洞路由算法

Int. J. High Speed Comput. Pub Date : 2000-09-01 DOI: 10.1142/S012905330000014X

Jau-Der Shih

引用次数: 0

Computation Time and Idle Time of Tiling Transformation on a Network of Workstations 工作站网络中平铺变换的计算时间和空闲时间

Int. J. High Speed Comput. Pub Date : 2000-09-01 DOI: 10.1142/S0129053300000126

S. Sathe, P. Nawghare

引用次数: 0

Embedding Hamiltonian Cycles, Linear Arrays and Rings in a Faulty Supercube 在故障超立方体中嵌入哈密顿环、线性阵列和环

Int. J. High Speed Comput. Pub Date : 2000-09-01 DOI: 10.1142/S0129053300000151

Jen-Chih Lin

引用次数: 5

Improving the Execution Efficiency of Barrier Synchronization in Software DSM through Static Analysis 通过静态分析提高软件DSM中屏障同步的执行效率

Int. J. High Speed Comput. Pub Date : 2000-09-01 DOI: 10.1142/S0129053300000138

Jae Bum Lee, C. Jhon

{"title":"Improving the Execution Efficiency of Barrier Synchronization in Software DSM through Static Analysis","authors":"Jae Bum Lee, C. Jhon","doi":"10.1142/S0129053300000138","DOIUrl":"https://doi.org/10.1142/S0129053300000138","url":null,"abstract":"In software Distributed Shared Memory (SDSM) systems, the large coherence granularity imposed by virtual memory page size tends to induce false sharing, which may lead to heavy network traffic or useless page misses on barrier operations. In this paper, we propose a method to alleviate the coherence overhead of barrier synchronization in the SDSM systems. It performs static analysis on a shared-memory program to examine data dependency between processors across global barriers, and then special primitives are inserted into the program in order to exploit the dependency information at run time. If the data modified before a barrier will be accessed by some of the other processors after the barrier, coherence messages are transferred only to the processors through the inserted primitives. Furthermore, if the modified data will not be used by any other processors, the primitives enforce the coherence messages to be delivered only to master process after the parallel execution of the program completes. We implemented the static analysis with SUIF parallelizing compiler and then evaluated the execution performance of modified programs in a 16-node SDSM system supporting AURC protocol. The experimental results show that our method is very effective at reducing the useless coherence messages, and also can improve the execution time substantially by reducing false sharing misses.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130920620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

K-Means-Type Algorithms on Distributed Memory Computer 分布式存储计算机上的k -均值算法

Int. J. High Speed Comput. Pub Date : 2000-06-01 DOI: 10.1142/S0129053300000096

M. Ng

引用次数: 12