Int. J. High Speed Comput.最新文献

筛选
英文 中文
Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors 多问题处理器的非顺序指令缓存预取
Int. J. High Speed Comput. Pub Date : 1999-03-01 DOI: 10.1142/S0129053399000065
A. Veidenbaum, Qing Zhao, Abduhl Shameer
{"title":"Non-Sequential Instruction Cache Prefetching for Multiple-Issue Processors","authors":"A. Veidenbaum, Qing Zhao, Abduhl Shameer","doi":"10.1142/S0129053399000065","DOIUrl":"https://doi.org/10.1142/S0129053399000065","url":null,"abstract":"This paper presents a novel instruction cache prefetching mechanism for multiple-issue processors. Such processors at high clock rates often have to use a small instruction cache which can have significant miss rates. Prefetching from secondary cache or even memory can hide the instruction cache miss penalties, but only if initiated sufficiently far ahead of the current program counter. Existing instruction cache prefetching methods are strictly sequential and do not prefetch past conditional branches which may occur almost every clock cycle in wide-issue processors. In this study, multi-level branch prediction is used to overcome this limitation. By keeping branch history and target addresses, two methods are defined to predict a future PC several branches past the current branch. A prefetching architecture using such a mechanism is defined and evaluated with respect to its accuracy, the impact of the instruction prefetching on performance, and its interaction with sequential prefetching. Both PC-based and history-based predictors are used to perform a single-lookup prediction. Targeting an on-chip L2 cache with low latency, prediction for 3 branch levels is evaluated for a 4-issue processor and cache architecture patterned after the DEC Alpha-21164. It is shown that history-based predictor is more accurate, but both predictors are effective. The prefetching unit using them can be effective and succeeds when the sequential prefetcher fails. In addition, non-sequential prefetching is better at hiding latency due to earlier initiation. The two types of prefetching eliminate different types of misses and thus can be effectively combined to achieve better performance.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Grouping Memory Consistency Model for Parallel-Multithreaded Shared-Memory Multiprocessor Systems 并行多线程共享内存多处理器系统的分组内存一致性模型
Int. J. High Speed Comput. Pub Date : 1999-03-01 DOI: 10.1142/S0129053399000041
Chao-Chin Wu, Cheng Chen
{"title":"Grouping Memory Consistency Model for Parallel-Multithreaded Shared-Memory Multiprocessor Systems","authors":"Chao-Chin Wu, Cheng Chen","doi":"10.1142/S0129053399000041","DOIUrl":"https://doi.org/10.1142/S0129053399000041","url":null,"abstract":"In this paper, we propose a hardware-centric memory consistency model particularly for shared-memory multiprocessors with parallel-multithreaded processing elements. According to the behavior of critical sections and the feature of parallel-multithreaded processors, we extend the release consistency model to a more relaxed memory model. A release reference at the end of a critical section can be executed locally regardless of whether all of its previous ordinary references have performed. The requirement is that another thread on the same processor is waiting for the lock to be freed. Two new instructions and two additional macros are needed to properly label a program for our proposed model. Moreover, we use a table per processing element to determine if there are any threads waiting for a specific lock. We have used five benchmark programs in the SPLASH suite to evaluate the performance gain for the new model. According to the simulation results, our proposed model is superior to the release consistency model up to 25%.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131128423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Fault-Tolerant Characteristics and Topological Properties of a Hierarchical Network of Hypercubes 超立方体分层网络的容错特性和拓扑性质
Int. J. High Speed Comput. Pub Date : 1999-03-01 DOI: 10.1142/S0129053399000028
A. Jayadevan, L. Patnaik
{"title":"Fault-Tolerant Characteristics and Topological Properties of a Hierarchical Network of Hypercubes","authors":"A. Jayadevan, L. Patnaik","doi":"10.1142/S0129053399000028","DOIUrl":"https://doi.org/10.1142/S0129053399000028","url":null,"abstract":"We analyse the fault-tolerant parameters and topological properties of a hierarchical network of hypercubes. We take a close look at the Extended Hypercube (EH) and the Hyperweave (HW) architectures and also compare them with other popular architectures. These two architectures have low diameter and constant degree of connectivity making it possible to expand these networks without affecting the existing configuration. A scheme for incrementally expanding this network is also presented. We also look at the performance of the ASCEND/DESCEND class of algorithms on these architectures.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130045270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalability of Sparse Cholesky Factorization 稀疏Cholesky分解的可扩展性
Int. J. High Speed Comput. Pub Date : 1999-03-01 DOI: 10.1142/S012905339900003X
T. Rauber, G. Rünger, C. Scholtes
{"title":"Scalability of Sparse Cholesky Factorization","authors":"T. Rauber, G. Rünger, C. Scholtes","doi":"10.1142/S012905339900003X","DOIUrl":"https://doi.org/10.1142/S012905339900003X","url":null,"abstract":"A variety of algorithms have been proposed for sparse Cholesky factorization, including left-looking, right-looking, and supernodal algorithms. This article investigates shared-memory implementations of several variants of these algorithms in a task-oriented execution model with dynamic scheduling. In particular, we consider the degree of parallelism, the scalability, and the scheduling overhead of the different algorithms. Our emphasis lies in the parallel implementation for relatively large numbers of processors. As execution platform, we use the SB-PRAM, a shared-memory machine with up to 2048 processors. This article can be considered as a case study in which we try to answer the question of which performance we can hope to get for a typical irregular application on an ideal machine on which the locality of memory accesses can be ignored but for which the overhead for the management of data structures still takes effect. The investigation shows that certain algorithms are the best choice for a small number of processors, while other algorithms are better for many processors.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123746552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Improved Mapping of Cyclic Elimination onto Hypercubes Using Data Replication 基于数据复制的循环消去到超立方体的改进映射
Int. J. High Speed Comput. Pub Date : 1997-12-01 DOI: 10.1142/S0129053397000180
Kartik Gopalan, C. Murthy
{"title":"An Improved Mapping of Cyclic Elimination onto Hypercubes Using Data Replication","authors":"Kartik Gopalan, C. Murthy","doi":"10.1142/S0129053397000180","DOIUrl":"https://doi.org/10.1142/S0129053397000180","url":null,"abstract":"In this paper, we propose a new mapping of the Cyclic Elimination (CE) algorithm for the solution of block tridiagonal linear system of equations onto hypercube multiprocessors. Unlike the previous mapping schemes, in our mapping of the CE algorithm all communications are restricted to physically adjacent processors, using the concept of data replication. The effectiveness of our mapping is demonstrated by comparing it with the existing mapping of the Cyclic Reduction algorithm onto hypercubes using both analytical and simulation methods.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"127 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133523848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
New Parallel Algorithms for Direct Solution of Sparse Linear Systems: Part I - Symmetric Coefficient Matrix 稀疏线性系统直接解的新并行算法:第一部分——对称系数矩阵
Int. J. High Speed Comput. Pub Date : 1997-12-01 DOI: 10.1142/S0129053397000167
Kartik Gopalan, C. Murthy
{"title":"New Parallel Algorithms for Direct Solution of Sparse Linear Systems: Part I - Symmetric Coefficient Matrix","authors":"Kartik Gopalan, C. Murthy","doi":"10.1142/S0129053397000167","DOIUrl":"https://doi.org/10.1142/S0129053397000167","url":null,"abstract":"In this paper, we propose a new parallel bidirectional algorithm, based on Cholesky factorization, for the solution of sparse symmetric system of linear equations. Unlike the existing algorithms, the numerical factorization phase of our algorithm is carried out in such a manner that the entire back substitution component of the substitution phase is replaced by a single step division. Since there is a substantial reduction in the time taken by the repeated execution of the substitution phase, our algorithm is particularly suited for the solution of systems with multiple b-vectors. The effectiveness of our algorithm is demonstrated by comparing it with the existing parallel algorithm, based on Cholesky factorization, using extensive simulation studies on two-dimensional problems discretized by FEM.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122435923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Load Distribution on Meshes with Broadcasting 广播网格的动态负载分配
Int. J. High Speed Comput. Pub Date : 1997-12-01 DOI: 10.1142/S0129053397000192
W. Lee, S. Hong, Jong Kim
{"title":"Dynamic Load Distribution on Meshes with Broadcasting","authors":"W. Lee, S. Hong, Jong Kim","doi":"10.1142/S0129053397000192","DOIUrl":"https://doi.org/10.1142/S0129053397000192","url":null,"abstract":"In this paper, we propose a mesh with a global bus as a multi-computer topology. This structure enhances the communication capability of the mesh and shows that the mesh with a global bus has more salient properties than the mesh, the hypercube, or other variants. These properties includes a small diameter, a relatively small degree, small average distance, suitability for broadcasting, small initial data distribution time, etc. We propose a dynamic load distribution algorithm to utilize the enhanced communication capability of the mesh with a global bus. Also, asynchronous bus control and arbitration logics are designed to support the proposed algorithm efficiently. It has been shown through simulation that the proposed dynamic load distribution is superior to the Receiver Initiated Diffusion method, previously known as the best to-date. The proposed algorithm shows better total task execution time and better processor utilization with a smaller number of task migrations.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"162 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121314826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Multicast on Wormhole Switch-Based Nowp 基于虫洞交换机的高效组播技术
Int. J. High Speed Comput. Pub Date : 1997-12-01 DOI: 10.1142/S0129053397000209
Kuo-Pao Fan, C. King
{"title":"Efficient Multicast on Wormhole Switch-Based Nowp","authors":"Kuo-Pao Fan, C. King","doi":"10.1142/S0129053397000209","DOIUrl":"https://doi.org/10.1142/S0129053397000209","url":null,"abstract":"High bandwidth and low latency switches are commercially available. Using these switches, it becomes possible to build a system area network to interconnect workstations and processor clusters together to provide a cost-effective parallel computing platform. A processor cluster may be a shared-memory multiprocessor or a mesh-connected multicomputer, etc. The interconnection topology on this kind of platform, called switch-based NOWP, is usually irregular. On such systems, multicast is an important collective communication operation. Two steps are involved in a multicast: (1) the source node sends the multicast message to the destinations which are connected to a switch directly or are the leader of a processor cluster, and (2) the leader node of each cluster sends the message to other destinations in the same cluster. In this paper, we propose two unicast-based multicast algorithms. Algorithm Multicast_1 performs those two steps sequentially; while Algorithm Multicast_2 overlaps them. Performance of the two algorithms will be evaluated and compared.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125859938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Mapping Pipelined Divided-difference Computations into Hypercubes 将管道差分计算映射到超立方体
Int. J. High Speed Comput. Pub Date : 1997-09-01 DOI: 10.1142/S012905339700012X
K. Chung, Yu-Wei Chen
{"title":"Mapping Pipelined Divided-difference Computations into Hypercubes","authors":"K. Chung, Yu-Wei Chen","doi":"10.1142/S012905339700012X","DOIUrl":"https://doi.org/10.1142/S012905339700012X","url":null,"abstract":"In numerical computations, the method of divided differences is a very important technique for polynomial approximations. Consider a pipelined divided–difference computation for approximating an nth degree polynomial. This paper first presents a method to transform the computational structure of divided differences into the pyramid tree with nodes. Based on graph embedding technique, without any extra communication delay, the pipelined divided–difference computation can be performed in a (2k + 1)-dimensional fault–free hypercube for n + 1 = 2k + t, k > 0, and 0 < t < 2k; the pipelined divided-difference computation can be further performed in a (2k + 2)-dimensional faulty hypercube to tolerate arbitrary (k - 1) faulty nodes/links. To the best of our knowledge, this is the first time such mapping methods are being proposed in the literature.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132083331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two Real-Time Flow Controls in Wormhole Networks 虫洞网络中的两种实时流量控制
Int. J. High Speed Comput. Pub Date : 1997-09-01 DOI: 10.1142/S0129053397000155
Hyojeong Song, Boseob Kwon, Ji-Yun Kim, H. Yoon
{"title":"Two Real-Time Flow Controls in Wormhole Networks","authors":"Hyojeong Song, Boseob Kwon, Ji-Yun Kim, H. Yoon","doi":"10.1142/S0129053397000155","DOIUrl":"https://doi.org/10.1142/S0129053397000155","url":null,"abstract":"In this paper, we study wormhole routed networks and envision their suitability for real-time traffic in a priority-driven paradigm. A traditional blocking flow control in wormhole routing may lead to a priority inversion in the sense that high priority packets are blocked by low priority packets for unlimited time. The priority inversion causes the frequent deadline missing even at a low network load. This paper therefore proposes two preemptive flow control policies where high priority packets can preempt network resources held by low priority packets. As a result, the proposed flow controls can resolve the priority inversion. Our simulations show that preemptive flow controls significantly reduce deadline miss ratios for various real-time traffic configurations.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125680739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信