Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)最新文献

筛选

英文中文

Accurate indirect branch prediction 准确的间接支路预测

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1145/279358.279380

K. Driesen, Urs Hölzle

引用次数: 112

Declustered disk array architectures with optimal and near-optimal parallelism 具有最优和接近最优并行性的散簇磁盘阵列架构

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI: 10.1109/ISCA.1998.694767

G. A. Alvarez, W. Burkhard, L. Stockmeyer, F. Cristian

{"title":"Declustered disk array architectures with optimal and near-optimal parallelism","authors":"G. A. Alvarez, W. Burkhard, L. Stockmeyer, F. Cristian","doi":"10.1109/ISCA.1998.694767","DOIUrl":"https://doi.org/10.1109/ISCA.1998.694767","url":null,"abstract":"This paper investigates the placement of data and parity on redundant disk arrays. Declustered organizations have been traditionally used to achieve fast reconstruction of a failed disk's contents. In previous work, Holland and Gibson identified six desirable properties for ideal layouts; however no declustered layout satisfying all properties has been published in the literature. We present a complete, constructive characterization of the collection of ideal declustered layouts possessing all six properties. Given that ideal layouts exist only for a limited set of configurations, we also present two novel layout families. PRIME and RELPR can tolerate multiple failures in a wide variety of configurations with slight deviations from the ideal. Our simulation studies show that the new layouts provide excellent parallel access performance and reduced incremental loads during degraded operation, when compared with previously published layouts. For large accesses and under high loads, response times for the new layouts are typically smaller than those of previously published declustered layouts by a factor of 2.5.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130210113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 55

Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors 在缓存一致的DSM多处理器中灵活地使用内存进行复制/迁移

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1900-01-01 DOI: 10.1145/279358.279403

V. Soundararajan, M. Heinrich, Ben Verghese, K. Gharachorloo, Anoop Gupta, J. Hennessy

{"title":"Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors","authors":"V. Soundararajan, M. Heinrich, Ben Verghese, K. Gharachorloo, Anoop Gupta, J. Hennessy","doi":"10.1145/279358.279403","DOIUrl":"https://doi.org/10.1145/279358.279403","url":null,"abstract":"Given the limitations of bus-based multiprocessors, CC-NUMA is the scalable architecture of choice for shared-memory machines. The most important characteristic of the CC-NUMA architecture is that the latency to access data on a remote node is considerably larger than the latency to access local memory. On such machines, good data locality can reduce memory stall time and is therefore a critical factor in application performance. In this paper we study the various options available to system designers to transparently decrease the fraction of data misses serviced remotely. This work is done in the context of the Stanford FLASH multiprocessor. FLASH is unique in that each node has a single pool of DRAM that can be used in a variety of ways by the programmable memory controller. We use the programmability of FLASH to explore different options for cache-coherence and data-locality in compute-server workloads. First, we consider two protocols for providing base cache-coherence, one with centralized directory information (dynamic pointer allocation) and another with distributed directory information (SCI). While several commercial systems are based on SCI, we find that a centralized scheme has superior performance. Next, we consider different hardware and software techniques that use some or all of the local memory in a node to improve data locality. Finally, we propose a hybrid scheme that combines hardware and software techniques. These schemes work on the same base platform with both user and kernel references from the workloads. The paper thus offers a realistic and fair comparison of replication/migration techniques that has not previously been feasible.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116215990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

首页上一页