Proceedings Fifth International Symposium on High-Performance Computer Architecture最新文献_第4页

Permutation development data layout (PDDL) 排列发展数据布局(PDDL)

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744365

T. Schwarz, J. Steinberg, W. Burkhard

引用次数: 17

A scalable cache coherent scheme exploiting wormhole routing networks 利用虫洞路由网络的可扩展缓存相干方案

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744367

Yunseok Rhee, Joonwon Lee

引用次数: 1

Instruction recycling on a multiple-path processor 多路径处理器上的指令回收

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744323

S. Wallace, D. Tullsen, B. Calder

引用次数: 13

Using Lamport clocks to reason about relaxed memory models 使用兰波特时钟来推理放松记忆模型

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744379

A. Condon, M. Hill, Manoj Plakal, Daniel J. Sorin

{"title":"Using Lamport clocks to reason about relaxed memory models","authors":"A. Condon, M. Hill, Manoj Plakal, Daniel J. Sorin","doi":"10.1109/HPCA.1999.744379","DOIUrl":"https://doi.org/10.1109/HPCA.1999.744379","url":null,"abstract":"Cache coherence protocols of current shared-memory multiprocessors are difficult to verify. Our previous work proposed an extension of Lamport's logical clocks for showing that multiprocessors can implement sequential consistency (SC) with an SGI Origin 2000-like directory protocol and a Sun Gigaplane-like split-transaction bus protocol. Many commercial multiprocessors, however, implement more relaxed models, such as SPARC Total Store Order (TSO), a variant of processor consistency, and Compaq (DEC) Alpha, a variant of weak consistency. This paper applies Lamport clocks to both a TSO and an Alpha implementation. Both implementations are based on the same Sun Gigaplane-like split-transaction bus protocol we previously used, but the TSO implementation places a first-in-first-out write buffer between a processor and its cache, while the Alpha implementation uses a coalescing write buffer. Both write buffers satisfy read requests for pending writes (i.e., do bypassing) without requiring the write to be immediately written to cache. Analysis shows how to apply Lamport clocks to verify TSO and Alpha specifications at the architectural level.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129988616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

Memory hierarchy considerations for fast transpose and bit-reversals 快速转置和位反转的内存层次考虑

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744320

K. Gatlin, L. Carter

引用次数: 24

Distributed modulo scheduling 分布式模调度

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744349

M. M. Fernandes, J. Llosa, N. Topham

引用次数: 48

Comparative evaluation of fine- and coarse-grain approaches for software distributed shared memory 软件分布式共享内存的细粒度和粗粒度方法的比较评价

Proceedings Fifth International Symposium on High-Performance Computer Architecture Pub Date : 1999-01-09 DOI: 10.1109/HPCA.1999.744377

S. Dwarkadas, K. Gharachorloo, L. Kontothanassis, D. Scales, M. Scott, R. Stets

{"title":"Comparative evaluation of fine- and coarse-grain approaches for software distributed shared memory","authors":"S. Dwarkadas, K. Gharachorloo, L. Kontothanassis, D. Scales, M. Scott, R. Stets","doi":"10.1109/HPCA.1999.744377","DOIUrl":"https://doi.org/10.1109/HPCA.1999.744377","url":null,"abstract":"Symmetric multiprocessors (SMPs) connected with low-latency networks provide attractive building blocks for software distributed shared memory systems. Two distinct approaches have been used: the fine-grain approach that instruments application loads and stores to support a small coherence granularity, and the coarse-grain approach based on virtual memory hardware that provides coherence at a page granularity. Fine-grain systems offer a simple migration path for applications developed on hardware multiprocessors by supporting coherence protocols similar to those implemented in hardware. On the other hand, coarse-grain systems can potentially provide higher performance through more optimized protocols and larger transfer granularities, while avoiding instrumentation overheads. Numerous studies have examined each approach individually, but major differences in experimental platforms and applications make comparison of the approaches difficult. This paper presents a detailed comparison of two mature systems, Shasta and Cashmere, representing the fine- and coarse-grain approaches, respectively. Both systems are tuned to run on the same commercially available, state-of-the-art cluster of AlphaServer SMPs connected via a Memory Channel network. As expected, our results show that Shasta provides robust performance for applications tuned for hardware multiprocessors, and can better tolerate fine-grain synchronization. In contrast, Cashmere is highly sensitive to fine-grain synchronization, but provides a performance edge for applications with coarse-grain behavior. Interestingly, we found that the performance gap between the systems can often be bridged by program modifications that address coherence and synchronization granularity. In addition, our study reveals some unexpected results related to the interaction of current compiler technology with application instrumentation, and the ability of SMP-aware protocols to avoid certain performance disadvantages of coarse-grain approaches.","PeriodicalId":287867,"journal":{"name":"Proceedings Fifth International Symposium on High-Performance Computer Architecture","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132381393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34