Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)最新文献_第3页

Integrated predicated and speculative execution in the IMPACT EPIC architecture 在IMPACT EPIC架构中集成预测和推测执行

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694777

David I. August, D. Connors, S. Mahlke, J. Sias, K. Crozier, B. Cheng, Patrick R. Eaton, Q. B. Olaniran, W. Hwu

引用次数: 136

Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance 结构和技术进步对HP/Convex Exemplar存储和通信性能的影响

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279401

Gheith A. Abandah, E. Davidson

{"title":"Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance","authors":"Gheith A. Abandah, E. Davidson","doi":"10.1145/279358.279401","DOIUrl":"https://doi.org/10.1145/279358.279401","url":null,"abstract":"Advances in microarchitecture, packaging, and manufacturing processes enable designers to build new systems with higher performance and scalability. Using microbenchmark techniques, we contrast the memory and communication performance of two generations of the HP/Convex Exemplar scalable parallel processing system. The SPP1000 and SPP2000 have significant architectural and implementation differences, but maintain upward binary compatibility. The SPP2000 employs manufacturing and packaging advances to obtain shorter system interconnects with wider data paths and improved functionality thereby reducing the latency and increasing the bandwidth of remote communication. Although the memory latency is not significantly improved, newer out-of-order execution processors coupled with nonblocking caches achieve much higher memory bandwidth. The SPP2000 has a richer system interconnect topology that allows scalability to a larger number of processors. The SPP2000 also employs innovations in its coherence protocols to improve synchronization and communication performance. This paper characterizes the performance effects of these changes, and identifies some remaining inefficiencies, in the cache coherence protocol and the node configuration, that future systems should address.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123201549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Branch prediction based on universal data compression algorithms 基于通用数据压缩算法的分支预测

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279370

E. Federovsky, M. Feder, S. Weiss

引用次数: 24

Low load latency through sum-addressed memory (SAM) 通过和寻址内存(SAM)实现低负载延迟

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279406

William L. Lynch, G. Lauterbach, J. Chamdani

引用次数: 17

An analysis of database workload performance on simultaneous multithreaded processors 并发多线程处理器上的数据库工作负载性能分析

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279367

J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, H. Levy, S. Parekh

{"title":"An analysis of database workload performance on simultaneous multithreaded processors","authors":"J. Lo, L. Barroso, S. Eggers, K. Gharachorloo, H. Levy, S. Parekh","doi":"10.1145/279358.279367","DOIUrl":"https://doi.org/10.1145/279358.279367","url":null,"abstract":"Simultaneous multithreading (SMT) is an architectural technique in which the processor issues multiple instructions from multiple threads each cycle. While SMT has been shown to be effective on scientific workloads, its performance on database systems is still an open question. In particular, database systems have poor cache performance, and the addition of multithreading has the potential to exacerbate cache conflicts. This paper examines database performance on SMT processors using traces of the Oracle database management system. Our research makes three contributions. First, it characterizes the memory-system behavior of database systems running on-line transaction processing and decision support system workloads. Our data show that while DBMS workloads have large memory footprints, there is substantial data reuse in a small, cacheable \"critical\" working set. Second, we show that the additional data cache conflicts caused by simultaneous-multithreaded instruction scheduling can be nearly eliminated by the proper choice of software-directed policies for virtual-to-physical page mapping and per-process address offsetting. Our results demonstrate that with the best policy choices, D-cache miss rates on an 8-context SMT are roughly equivalent to those on a single-threaded superscalar. Multithreading also leads to better interthread instruction cache sharing, reducing I-cache miss rates by up to 35%. Third, we show that SMT's latency tolerance is highly effective for database applications. For example, using a memory-intensive OLTP workload, an 8-context SMT processor achieves a 3-fold increase in instruction throughput over a single-threaded superscalar with similar resources.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127124987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 254

Increasing TLB reach using superpages backed by shadow memory 使用由影子内存支持的超级页增加TLB覆盖范围

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694775

M. Swanson, L. Stoller, J. Carter

引用次数: 87

Dynamic history-length fitting: a third level of adaptivity for branch prediction 动态历史长度拟合:分支预测的第三级适应性

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694771

Toni Juan, K. Sanjeevan, J. Navarro

{"title":"Dynamic history-length fitting: a third level of adaptivity for branch prediction","authors":"Toni Juan, K. Sanjeevan, J. Navarro","doi":"10.1109/ISCA.1998.694771","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694771","url":null,"abstract":"Accurate branch prediction is essential for obtaining high performance in pipelined superscalar processors that execute instructions speculatively. Some of the best current predictors combine a part of the branch address with a fixed amount of global history of branch outcomes in order to make a prediction. These predictors cannot perform uniformly well across all workloads because the best amount of history to be used depends on the code, the input data and the frequency of context switches. Consequently, all predictors that use a fixed history length are therefore unable to perform up to their maximum potential. We introduce a method-called DHLF-that dynamically determines the optimum history length during execution, adapting to the specific requirements of any code, input data and system workload. Our proposal adds an extra level of adaptivity to two-level adaptive branch predictors. The DHLF method can be applied to any one of the predictors that combine global branch history with the branch address. We apply the DHLF method to gshare (dhlf-gshare) and obtain near-optimal results for all SPECint95 benchmarks, with and without context switches. Some results are also presented for gskewed (dhlf-gskewed), confirming that other predictors can benefit from our proposal.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"105 12S1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122240489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 111

Switcherland: a QoS communication architecture for workstation clusters 瑞士:工作站集群的QoS通信架构

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279373

H. Eberle, E. Oertli

{"title":"Switcherland: a QoS communication architecture for workstation clusters","authors":"H. Eberle, E. Oertli","doi":"10.1145/279358.279373","DOIUrl":"https://doi.org/10.1145/279358.279373","url":null,"abstract":"Computer systems have become powerful enough to process continuous data streams such as video or animated graphics. While processing power and communication bandwidth of today's systems typically are sufficient, quality of service (QoS) guarantees as required for handling such data types cannot be provided by these systems in adequate ways. We present Switcherland, a scalable communication architecture based on crossbar switches that provides QoS guarantees for workstation clusters in the form of reserved bandwidth and bounded transmission delays. Similar to the ATM technology Switcherland provides QoS guarantees with the help of service classes, that is, data transfers are characterized as variable bit rare traffic or constant bit rate traffic. However, unlike LAN technologies, Switcherland is optimized for cluster computing in that (i) it serves as a backplane interconnection fabric as well as a LAN, (ii) it extends support for service classes by also covering the end nodes of the network, (iii) it provides low latency in the order of one microsecond per switch, and (iv) it uses a communication model based on a global memory to simplify programming.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129809284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Exploiting spatial locality in data caches using spatial footprints 利用空间足迹在数据缓存中利用空间局部性

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694794

Sanjeev Kumar, C. Wilkerson

{"title":"Exploiting spatial locality in data caches using spatial footprints","authors":"Sanjeev Kumar, C. Wilkerson","doi":"10.1109/ISCA.1998.694794","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694794","url":null,"abstract":"Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, less than half of the data brought into the cache gets used before eviction. The unused portion of the cache line negatively impacts performance by wasting bandwidth and polluting the cache by replacing potentially useful data that would otherwise remain in the cache. This paper describes an alternative approach to exploit spatial locality available in data caches. On a cache miss, our mechanism, called Spatial Footprint Predictor (SFP), predicts which portions of a cache block will get used before getting evicted. The high accuracy of the predictor allows us to exploit spatial locality exhibited in larger blocks of data yielding better miss ratios without significantly impacting the memory access latencies. Our evaluation of this mechanism shows that the miss rate of the cache is improved, on average, by 18% in addition to a significant reduction in the bandwidth requirement.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124305864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 181

Memory system characterization of commercial workloads 商业工作负载的内存系统特性

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694758

L. Barroso, K. Gharachorloo, Edouard Bugnion

引用次数: 444