Conference Proceedings. The 24th Annual International Symposium on Computer Architecture最新文献_第2页

Coherence Controller Architectures For Smp-based Cc-numa Multiprocessors 基于smp的Cc-numa多处理器的相干控制器架构

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264203

Maged M. Michael, Ashwini K. Nanda, B. Lim, M. Scott

{"title":"Coherence Controller Architectures For Smp-based Cc-numa Multiprocessors","authors":"Maged M. Michael, Ashwini K. Nanda, B. Lim, M. Scott","doi":"10.1145/264107.264203","DOIUrl":"https://doi.org/10.1145/264107.264203","url":null,"abstract":"Scalable distributed shared-memory architectures rely on coherence controllers on each processing node to synthesize cache-coherent shared memory across the entire machine. The coherence controllers execute coherence protocol handlers that may be hardwired in custom hardware or programmed in a protocol processor within each coherence controller. Although custom hardware runs faster, a protocol processor allows the coherence protocol to be tailored to specific application needs and may shorten hardware development time. Previous research show that the increase in application execution time due to protocol processors over custom hardware is minimal.With the advent of SMP nodes and faster processors and networks, the tradeoff between custom hardware and protocol processors needs to be reexamined. This paper studies the performance of custom-hardware and protocol-processor-based coherence controllers in SMP-node-based CC-NUMA systems on applications from the SPLASH-2 suite. Using realistic parameters and detailed models of existing state-of-the-art system components, it shows that the occupancy of coherence controllers can limit the performance of applications with high communication requirements, where the execution time using protocol processors can be twice as long as using custom hardware.To gain a deeper understanding of the tradeoff, we investigate the effect of varying several architectural parameters that influence the communication characteristics of the applications and the underlying system on coherence controller performance. We identify measures of applications' communication requirements and their impact on the performance penalty of protocol processors, which can help system designers predict performance penalties for other applications. We also study the potential of improving the performance of hardware-based and protocol-processor-based coherence controllers by separating or duplicating critical components.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132196708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

The Design And Analysis Of A Cache Architecture For Texture Mapping 纹理映射缓存体系结构的设计与分析

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264152

Z. S. Hakura, Anoop Gupta

{"title":"The Design And Analysis Of A Cache Architecture For Texture Mapping","authors":"Z. S. Hakura, Anoop Gupta","doi":"10.1145/264107.264152","DOIUrl":"https://doi.org/10.1145/264107.264152","url":null,"abstract":"The effectiveness of texture mapping in enhancing the realism of computer generated imagery has made support for real-time texture mapping a critical part of graphics pipelines. Despite a recent surge in interest in three-dimensional graphics from computer architects, high-quality high-speed texture mapping has so far been confined to costly hardware systems that use brute-force techniques to achieve high performance. One obstacle faced by designers of texture mapping systems is the requirement of extremely high bandwidth to texture memory. High bandwidth is necessary since there are typically tens to hundreds of millions of accesses to texture memory per second. In addition, to achieve the high clock rates required in graphics pipelines, low-latency access to texture memory is needed. In this paper, we propose the use of texture image caches to alleviate the above bottlenecks, and evaluate various tradeoffs that arise in such designs.We find that the factors important to cache behavior are (i) the representation of texture images in memory, (ii) the rasterization order on screen and (iii) the cache organization. Through a detailed investigation of these issues, we explore the best way to exploit locality of reference and determine whether this technique is robust with respect to different scenes and different amounts of texture. Overall, we observe that there is a significant amount of temporal and spatial locality and that the working set sizes are relatively small (at most 16KB) across all cases that we studied. Consequently, the memory bandwidth requirements of a texture cache system are substantially lower (at least three times and as much as fifteen times) than the memory bandwidth requirements of a system which achieves equivalent performance but does not utilize a cache. These results are very encouraging and indicate that caching is a promising approach to designing memory systems for texture mapping.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124953496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 183

On Deadlocks In Interconnection Networks 关于互连网络中的死锁

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264127

T. Pinkston, Sugath Warnakulasuriya

引用次数: 87

Data Prefetching On The HP PA-8000 HP PA-8000预取数据

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264208

Vatsa Santhanam, Edward H. Gornish, W. Hsu

引用次数: 90

Implementing Multidestination Worms In Switch-based Parallel Systems: Architectural Alternatives And Their Impact 在基于交换机的并行系统中实现多目标蠕虫:架构选择及其影响

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264129

Rajeev Sivaram, C. Stunkel, D. Panda

{"title":"Implementing Multidestination Worms In Switch-based Parallel Systems: Architectural Alternatives And Their Impact","authors":"Rajeev Sivaram, C. Stunkel, D. Panda","doi":"10.1145/264107.264129","DOIUrl":"https://doi.org/10.1145/264107.264129","url":null,"abstract":"Multidestination message passing has been proposed as an attractive mechanism for efficiently implementing multicast and other collective operations on direct networks. However, applying this mechanism to switch-based parallel systems is non-trivial. In this paper we propose alternative switch architectures with differing buffer organizations to implement multidestination worms on switch-based parallel systems. First, we discuss issues related to such implementation (deadlock-freedom, replication mechanisms, header encoding, and routing). Next, we demonstrate how an existing central-buffer-based switch architecture supporting unicast message passing can be enhanced to accommodate multidestination message passing. Similarly, implementing multidestination worms on an input-buffer-based switch architecture is discussed. Both of these implementations are evaluated against each other as well as against a software-based scheme using the central buffer organization. Simulation experiments under a range of traffic (multiple multicast, bimodal, varying degree of multicast, and message length) and system size are used for evaluation. The study demonstrates the superiority of the central-buffer-based switch architecture. It also indicates that under bimodal traffic the central-buffer-based hardware multicast implementation affects background unicast traffic less adversely compared to a software-based multicast implementation. Thus, multidestination message passing can easily be applied to switch-based parallel systems to deliver good collective communication performance.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115995623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 63

The Agree Predictor: A Mechanism For Reducing Negative Branch History Interference 同意预测器:减少负分支历史干扰的机制

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264210

Eric Sprangle, R. Chappell, M. Alsup, Y. Patt

引用次数: 173

The Energy Efficiency Of Iram Architectures 伊朗建筑的能源效率

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264214

R. Fromm, S. Perissakis, N. Cardwell, C. Kozyrakis, B. McGaughy, D. Patterson, Thomas E. Anderson, K. Yelick

{"title":"The Energy Efficiency Of Iram Architectures","authors":"R. Fromm, S. Perissakis, N. Cardwell, C. Kozyrakis, B. McGaughy, D. Patterson, Thomas E. Anderson, K. Yelick","doi":"10.1145/264107.264214","DOIUrl":"https://doi.org/10.1145/264107.264214","url":null,"abstract":"Portable systems demand energy efficiency in order to maximize battery life. IRAM architectures, which combine DRAM and a processor on the same chip in a DRAM process, are more energy efficient than conventional systems. The high density of DRAM permits a much larger amount of memory on-chip than a traditional SRAM cache design in a logic process. This allows most or all IRAM memory accesses to be satisfied on-chip. Thus there is much less need to drive high-capacitance off-chip buses, which contribute significantly to the energy consumption of a system. To quantify this advantage we apply models of energy consumption in DRAM and SRAM memories to results from cache simulations of applications reflective of personal productivity tasks on low power systems. We find that IRAM memory hierarchies consume as little as 22% of the energy consumed by a conventional memory hierarchy for memory-intensive applications, while delivering comparable performance. Furthermore, the energy consumed by a system consisting of an IRAM memory hierarchy combined with an energy efficient CPU core is as little as 40% of that of the same CPU core with a traditional memory hierarchy.","PeriodicalId":405506,"journal":{"name":"Conference Proceedings. The 24th Annual International Symposium on Computer Architecture","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125429215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 128

Reactive NUMA: A Design For Unifying S-COMA And CC-NUMA 反应性NUMA:一种统一S-COMA和CC-NUMA的设计

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264205

B. Falsafi, D. Wood

引用次数: 115

Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture 集群架构中通信延迟、开销和带宽的影响

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264146

Richard P. Martin, Amin Vahdat, D. Culler, Thomas E. Anderson

引用次数: 275

The Mercury Interconnect Architecture: A Cost-effective Infrastructure For High-performance Servers 水星互连体系结构:高性能服务器的经济高效的基础设施

Conference Proceedings. The 24th Annual International Symposium on Computer Architecture Pub Date : 1997-06-01 DOI: 10.1145/264107.264149

W. Weber, Stephen Gold, Pat Helland, Takeshi Shimizu, Thomas Wicki, W. Wilcke

引用次数: 66