Int. J. High Speed Comput.最新文献_第9页

Memory System Design in Superscalar Processing 超标量处理中的存储系统设计

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000233

N. Lu, C. Chung

{"title":"Memory System Design in Superscalar Processing","authors":"N. Lu, C. Chung","doi":"10.1142/S0129053395000233","DOIUrl":"https://doi.org/10.1142/S0129053395000233","url":null,"abstract":"In this paper, we study the memory system design for superscalar processing. Benchmarking is used to examine the execution behavior of load/store instructions, such as load/store parallelism and memory load/store port utilization. It is found that the use of only a single load/store port forms a system bottle-neck. A superscalar processor benefits from multiple load/store ports and system performance saturates with two load/store ports. The memory system must be carefully designed if multiple load/store ports are supported in a superscalar processor. Thus, we consider the design of the data cache subsystem. The data cache configurations we investigate include multiported cache, multibank cache, and duplicated cache. Through benchmarking, we find that the duplicated cache performs well in most benchmarks. Yet the cost of a duplicated cache is higher. In a superscalar multiprocessing environment, in order to properly maintain memory consistency, we must consider the load/store ordering of the processors. In superscalar processors, the load/store ordering may be in one of three forms: total ordering, load bypassing, and load forwarding. In this research, we conclude that to support the sequential consistency model, the load/store instructions must be totally ordered. Load bypassing and load forwarding are sufficient to support the processor consistency model.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116318642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Approximate Agreement Algorithm for Wraparound Meshes 环绕网格的近似一致算法

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000221

R. Cheng, C. Chung

引用次数: 0

Parallel Matrix Multiplication Algorithms on Hypercube Multiprocessors 超立方体多处理器上的并行矩阵乘法算法

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S012905339500021X

Peizong Lee

引用次数: 3

Multithreaded Decoupled Architecture 多线程解耦架构

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000257

M. Dorojevets, V. Oklobdzija

{"title":"Multithreaded Decoupled Architecture","authors":"M. Dorojevets, V. Oklobdzija","doi":"10.1142/S0129053395000257","DOIUrl":"https://doi.org/10.1142/S0129053395000257","url":null,"abstract":"A new computer architecture called the Multithreaded Decoupled Architecture has been proposed for exploiting fine-grain parallelism. It develops further some of the ideas of parallel processing implemented in the Russian MARS-M computer in the 1980s. The MTD architecture aims at enhancing both total machine throughput and a single thread performance. To achieve this goal, we propose a two-level parallel computation model. Its low level defines the decoupled parallel execution of instructions within program fragments not containing branches. We will be referring to these fragments as basic blocks. The model’s high level defines the parallel execution of multiple basic blocks representing a function or procedure. This scheduling hierarchy reflects the MTD storage hierarchy. Together the scheduling and storage models allow a processor with multiple execution units to exploit several forms of parallelism within a procedure. The compiler provides the hardware with thread register usage masks to allow run-time enforcing of control and data dependencies between the high level threads. We present a possible implementation of the MTD-processor with multiple execution units and two-level distributed register memory.","PeriodicalId":270006,"journal":{"name":"Int. J. High Speed Comput.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116828787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

On Optimal Weighted Binary Trees 关于最优加权二叉树

Int. J. High Speed Comput. Pub Date : 1995-09-01 DOI: 10.1142/S0129053395000245

J. Pradhan, C. V. Sastry

引用次数: 1

Benchmarking Fortran Intrinsic Functions Fortran固有函数的基准测试

Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000129

Toru Nagai

引用次数: 1

Block Preconditioned Conjugate Gradient Methods on a Distributed Virtual Shared Memory Multiprocessor 分布式虚拟共享内存多处理器的块预条件共轭梯度方法

Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000105

L. Giraud

引用次数: 2

A Minimal Synchronization Overhead Affinity Scheduling Algorithm for Shared-Memory Multiprocessors 共享内存多处理器的最小同步开销关联调度算法

Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000130

Yi-Min Wang, R. Chang

引用次数: 4

A General-Purpose Parallel Sorting Algorithm 一种通用并行排序算法

Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000166

A. Tridgell, R. Brent

引用次数: 9

Factorized Sparse Approximate Inverse Preconditioning II: Solution of 3D FE Systems on Massively Parallel Computers 因式稀疏近似逆预处理II:大规模并行计算机上三维有限元系统的解

Int. J. High Speed Comput. Pub Date : 1995-06-01 DOI: 10.1142/S0129053395000117

L. Kolotilina, A. Yeremin

引用次数: 66