[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture最新文献_第2页

Data prefetching in multiprocessor vector cache memories 多处理器矢量高速缓存存储器中的数据预取

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115959

John W. C. Fu, J. Patel

引用次数: 171

Classification and performance evaluation of instruction buffering techniques 指令缓冲技术的分类与性能评价

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115968

L. John, P. T. Hulina, L. D. Coraor, Dhamir N. Mannai

引用次数: 10

An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks 使用PERFECT俱乐部基准的CRAY Y-MP处理器的实证研究

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115970

S. Vajapeyam, G. Sohi, W. Hsu

{"title":"An empirical study of the CRAY Y-MP processor using the PERFECT club benchmarks","authors":"S. Vajapeyam, G. Sohi, W. Hsu","doi":"10.1145/115952.115970","DOIUrl":"https://doi.org/10.1145/115952.115970","url":null,"abstract":"Characterization of machines, by studying pro~am usage of their architectural and organizational features, IS art essential ~art of the desi~n recess. ln this aper we re ort Y EL an empimcal study of a smg e processor of t e CRAY Y- P, using as benchmarks long-running scientific applications from the PERFECT Club benchmark set. Since the compiler plays a major mle in determining machine utilization and program execution speed, we compile our benchmarks usin the state-of-the-art Cray Research production FORTRA” compiler. We investigate instruction set usage, operation execution counts, sizes of basic blocks in the prorams, and instruction issue rate. We observe, among other 3“ mgs, that the vectorized fraction of the dynamic rogram % operation count ranges from 4% to %% for our bent marks, Instructions that move values between the scalar registers and corresponding backup registers form a si nificant fraction of the dynamic instruction count. Basic %locks which are more than a hundred instructions in size are significant in number; both small and large basic blocks are important from the point of view of pro ram performance. The E","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116893766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Instruction level profiling and evaluation of the IBM RS/6000 IBM RS/6000的指令级分析与评估

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115971

Chriss Stephens, B. Cogswell, J. Heinlein, Gregory Palmer, John Paul Shen

{"title":"Instruction level profiling and evaluation of the IBM RS/6000","authors":"Chriss Stephens, B. Cogswell, J. Heinlein, Gregory Palmer, John Paul Shen","doi":"10.1145/115952.115971","DOIUrl":"https://doi.org/10.1145/115952.115971","url":null,"abstract":"This paper reports preliminary results from using goblin, a new instruction level profiling system, to evaluate the IBM RISC System/6000 architecture. The evaluation presented is based on the SPEC benchmark suite. Each SPEC program (except gcc) is processed by goblin to produce an instrumented version. During execution of the instrumented program, profiling routines are invoked which trace the execution of the program. These routines also collect statistics on dynamic instruction mix, branching behavior, and resource utilization: Based on these statistics, the actual performance and the architectural efficiency of the RS/SOOO are evaluated. In order to provide a context for this evaluation, a comparison to the DECStation 3100 is also presented. The entire profiling and evaluation experiment on nine of the ten SPEC programs involves tracing and analyzing over 32 billion instructions on the RS/6000. The evaluation indicates that for the SPEC benchmark suite the architecture of the RS/6000 is well balanced and exhibits impressive performance, especially on the floating-point intensive applications.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122281450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

Evaluation of memory system extensions 内存系统扩展的评估

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1109/ISCA.1991.1021602

Kai Li, K. Petersen

引用次数: 18

Strategies for achieving improved processor throughput 实现改进处理器吞吐量的策略

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115988

M. Farrens, A. Pleszkun

引用次数: 44

Reducing memory contention in shared memory multiprocessors 减少共享内存多处理器中的内存争用

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115960

D. Harper

引用次数: 4

IXM2: a parallel associative processor IXM2:一个并行关联处理器

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115956

T. Higuchi, T. Furuya, Ken'ichi Handa, Naoto Takahashi, H. Nishiyama, A. Kokubu

引用次数: 18

Virtualizing the VAX architecture 虚拟化VAX架构

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115952.115990

J. S. Hall, Paul T. Robinson

引用次数: 25

Multithreading: a revisionist view of dataflow architectures 多线程:数据流架构的修正主义观点

[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture Pub Date : 1991-04-01 DOI: 10.1145/115953.115986

G. Papadopoulos, K. R. Traub

{"title":"Multithreading: a revisionist view of dataflow architectures","authors":"G. Papadopoulos, K. R. Traub","doi":"10.1145/115953.115986","DOIUrl":"https://doi.org/10.1145/115953.115986","url":null,"abstract":"Although they are powerful intermediate representations for compilers, pure dataflow graphs are incomplete, and perhaps even undesirable, machine languages. They are incomplete because it is hard to encode critical sections and imperative operations which are essential for the efficient execution of operating system functions, such as resource management. They may be undesirable because they imply a uniform dynamic scheduling policy for all instructions, preventing a compiler from expressing a static schedule which could result in greater run time efficiency, both by reducing redundant operand synchronization, and by using high speed registers to communicate state between instructions. In this paper, we develop a new machine-level programming model which builds upon two previous improvements to the dataflow execution model: sequential scheduling of instructions, and multiported registers for expression temporaries. Surprisingly, these improvements have required almost no architectural changes to explicit token store (ETS) dataflow hardware, only a shift in mindset when reasoning about how that hardware works. Rather than viewing computational progress as the consumption of tokens and the firing of enabled instructions, we instead reason about the evolution of multiple, interacting sequential threads, where forking and joining are extremely efficient. Because this new paradigm has proven so valuable in coding resource management operations and in improving code efficiency, it is now the cornerstone of the Monsoon instruction set architecture and macro assembly language. In retrospect, this suggests that there is a continuum of multithreaded architectures, with pure ETS dataflow and single threaded von Neumann at the extrema. We use this new perspective to better understand the relative strengths and weaknesses of the Monsoon implement ation.","PeriodicalId":187095,"journal":{"name":"[1991] Proceedings. The 18th Annual International Symposium on Computer Architecture","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126589183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 142