Proceedings Eighth International Symposium on High Performance Computer Architecture最新文献

Power issues related to branch prediction 与分支预测相关的电源问题

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995713

Dharmesh Parikh, K. Skadron, Yan Zhang, M. Barcella, M. Stan

引用次数: 128

The minimax cache: an energy-efficient framework for media processors 极大极小缓存:媒体处理器的高能效框架

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995704

O. Unsal, I. Koren, C. M. Krishna, C. A. Moritz

引用次数: 36

Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management 精确和局部动态热管理的控制理论技术和热rc建模

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995695

K. Skadron, T. Abdelzaher, M. Stan

{"title":"Control-theoretic techniques and thermal-RC modeling for accurate and localized dynamic thermal management","authors":"K. Skadron, T. Abdelzaher, M. Stan","doi":"10.1109/HPCA.2002.995695","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995695","url":null,"abstract":"This paper proposes the use of formal feedback control theory as a way to implement adaptive techniques in the processor architecture. Dynamic thermal management (DTM) is used as a test vehicle, and variations of a PID controller (Proportional-Integral-Differential) are developed and tested for adaptive control of fetch \"toggling.\" To accurately test the DTM mechanism being proposed, this paper also develops a thermal model based on lumped thermal resistances and thermal capacitances. This model is computationally efficient and tracks temperature at the granularity of individual functional blocks within the processor. Because localized heating occurs much faster than chip-wide heating, some parts of the processor are more likely, to be \"hot spots\" than others. Experiments using Wattch and the SPEC2000 benchmarks show that the thermal trigger threshold can be set within 0.2/spl deg/ of the maximum temperature and yet never enter thermal emergency. This cuts the performance loss of DTM by 65% compared to the previously described fetch toggling technique that uses a response of fixed magnitude.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133701982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 426

Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling 采用动态电压和频率缩放的多时钟域节能处理器设计

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995696

Greg Semeraro, G. Magklis, R. Balasubramonian, D. Albonesi, S. Dwarkadas, M. Scott

{"title":"Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling","authors":"Greg Semeraro, G. Magklis, R. Balasubramonian, D. Albonesi, S. Dwarkadas, M. Scott","doi":"10.1109/HPCA.2002.995696","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995696","url":null,"abstract":"As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end , integer units, floating point units, and load-store units. We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132159561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 401

Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay 利用可调整大小缓存设计的选择来优化深亚微米处理器的能量延迟

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995706

Se-Hyun Yang, Michael D. Powell, B. Falsafi, T. N. Vijaykumar

{"title":"Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay","authors":"Se-Hyun Yang, Michael D. Powell, B. Falsafi, T. N. Vijaykumar","doi":"10.1109/HPCA.2002.995706","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995706","url":null,"abstract":"Cache memories account for a significant fraction of a chip's overall energy dissipation. Recent research advocates using \"resizable\" caches to exploit cache requirement variability in applications to reduce cache size and eliminate energy dissipation in the cache's unused sections with minimal impact on performance. Current proposals for resizable caches fundamentally vary in two design aspects: (1) cache organization, where one organization, referred to as selective-ways, varies the cache's set-associativity, while the other, referred to as selective-sets, varies the number of cache sets, and (2) resizing strategy, where one proposal statically sets the cache size prior to an application's execution, while the other allows for dynamic resizing both within and across applications. In this paper, we compare and contrast, for the first time, the proposed design choices for resizable caches, and evaluate the effectiveness of cache resizings in reducing the overall energy-delay in deep-submicron processors. In addition, we propose a hybrid selective-sets-and-ways cache organization that always offers equal or better resizing granularity than both of previously proposed organizations. We also investigate the energy savings from resizing d-cache and i-cache together to characterize the interaction between d-cache and i-cache resizings.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117134450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 149

Improving value communication for thread-level speculation 改进线程级推测的值通信

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995699

J. Steffan, Christopher B. Colohan, Antonia Zhai, T. Mowry

{"title":"Improving value communication for thread-level speculation","authors":"J. Steffan, Christopher B. Colohan, Antonia Zhai, T. Mowry","doi":"10.1109/HPCA.2002.995699","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995699","url":null,"abstract":"Thread-level speculation (TLS) allows us to automatically parallelize general-purpose programs by supporting parallel execution of threads that might not actually be independent. In this paper, we show that the key to good performance ties in the three different ways to communicate a value between speculative threads: speculation, synchronization and prediction. The difficult part is deciding how and when to apply each method. This paper shows how we can apply value prediction, dynamic synchronization and hardware instruction prioritization to improve value communication and hence performance in several SPECint benchmarks that have been automatically transformed by our compiler to exploit TLS. We find that value prediction can be effective when properly throttled to avoid the high costs of mis-prediction, while most of the gains of value prediction can be more easily achieved by exploiting silent stores. We also show that dynamic synchronization is quite effective for most benchmarks, while hardware instruction prioritization is not. Overall, we find that these techniques have great potential for improving the performance of TLS.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123558493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 121

User-level communication in cluster-based servers 基于集群的服务器中的用户级通信

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995717

E. V. Carrera, S. Rao, L. Iftode, R. Bianchini

{"title":"User-level communication in cluster-based servers","authors":"E. V. Carrera, S. Rao, L. Iftode, R. Bianchini","doi":"10.1109/HPCA.2002.995717","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995717","url":null,"abstract":"Clusters of commodity computers are currently being used to provide the scalability required by several popular Internet services. In this paper we evaluate an efficient cluster-based WWW server, as a function of the characteristics of the intra-cluster communication architecture. More specifically, we evaluate the impact of processor overhead, network bandwidth, remote memory writes, and zero-copy data transfers on the performance of our server. Our experimental results with an 8-node cluster and four real WWW traces show that network bandwidth affects the performance of our server by only 6%. In contrast, user-level communication can improve performance by as much as 29%. Low processor overhead, remote memory writes, and zero-copy all make small contributions towards this overall gain. To be able to extrapolate from our experimental results, we use an analytical model to assess the performance of our server under different workload characteristics, different numbers of cluster nodes, and higher performance systems. Our modeling results show that higher gains (of up to 55%) can be accrued for workloads with large working sets and next-generation servers running on large clusters.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121471346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 51

The FAB predictor: using Fourier analysis to predict the outcome of conditional branches FAB预测器:使用傅里叶分析来预测条件分支的结果

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995712

Martin Kämpe, P. Stenström, M. Dubois

引用次数: 10

A new memory monitoring scheme for memory-aware scheduling and partitioning 一种新的内存监控方案，用于内存感知调度和分区

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995703

G. Suh, S. Devadas, L. Rudolph

引用次数: 324

Microarchitectural simulation and control of di/dt-induced power supply voltage variation di/dt感应电源电压变化的微结构仿真与控制

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995694

Edward T. Grochowski, D. Ayers, V. Tiwari

引用次数: 90