Proceedings Eighth International Symposium on High Performance Computer Architecture最新文献

筛选
英文 中文
Using complete machine simulation for software power estimation: the SoftWatt approach 用整机模拟进行软件功率估计:SoftWatt方法
S. Gurumurthi, A. Sivasubramaniam, M. J. Irwin, N. Vijaykrishnan, M. Kandemir, Tao Li, L. John
{"title":"Using complete machine simulation for software power estimation: the SoftWatt approach","authors":"S. Gurumurthi, A. Sivasubramaniam, M. J. Irwin, N. Vijaykrishnan, M. Kandemir, Tao Li, L. John","doi":"10.1109/HPCA.2002.995705","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995705","url":null,"abstract":"Power dissipation has become one of the most critical factors for the continued development of both high-end and low-end computer systems. We present a complete system power simulator, called SoftWatt, that models the CPU, memory hierarchy, and a low-power disk subsystem and quantifies the power behavior of both the application and operating system. This tool, built on top of the SimOS infrastructure, uses validated analytical energy models to identify the power hotspots in the system components, capture relative contributions of the user and kernel code to the system power profile, identify the power-hungry operating system services and characterize the variance in kernel power profile with respect to workload. Our results using Spec JVM98 benchmark suite emphasize the importance of complete system simulation to understand the power impact of architecture and operating system on application execution.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131187140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 237
Loose loops sink chips 松环沉芯片
Eric Borch, Eric Tune, Srilatha Manne, J. Emer
{"title":"Loose loops sink chips","authors":"Eric Borch, Eric Tune, Srilatha Manne, J. Emer","doi":"10.1109/HPCA.2002.995719","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995719","url":null,"abstract":"This paper explores the concept of micro-architectural loops and discusses their impact on processor pipelines. In particular, we establish the relationship between loose loops and pipeline length and configuration, and show their impact on performance. We then evaluate the load resolution loop in detail and propose the distributed register algorithm (DRA) as a way of reducing this loop. It decreases the performance loss due to load mis-speculations by reducing the issue-to-execute latency in the pipeline. A new loose loop is introduced into the pipeline by the DRA, but the frequency of mis-speculations is very low. The reduction in latency from issue to execute, along with a low mis-speculation rate in the DRA result in up to a 4% to 15% improvement in performance using a detailed architectural simulator.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130435029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 175
Fine-grain priority scheduling on multi-channel memory systems 多通道存储系统的细粒度优先级调度
Zhichun Zhu, Zhao Zhang, Xiaodong Zhang
{"title":"Fine-grain priority scheduling on multi-channel memory systems","authors":"Zhichun Zhu, Zhao Zhang, Xiaodong Zhang","doi":"10.1109/HPCA.2002.995702","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995702","url":null,"abstract":"Configurations of contemporary DRAM memory systems become increasingly complex. A recent study shows that the application performance is highly sensitive to choices of configurations. In this study we show that, by utilizing fine-grain priority access scheduling, we are able to find a workload independent configuration that achieves optimal performance on a multichannel memory system. Our approach can well utilize the available high concurrency and high bandwidth on such memory systems, and effectively reduce the memory stall time of memory-intensive applications. Conducting execution-driven simulation of a 4-way issue, a 2 GHz processor, we show that the average performance improvement for fifteen memory-intensive SPEC2000 programs by using an optimized fine-grain priority scheduling is about 13% and 8% for a 2-channel and a 4-channel Direct Rambus DRAM memory system, respectively, compared with gang scheduling. Compared with burst scheduling, the average performance improvement is 16% and 14% for the 2-channel and 4-channel memory systems, respectively.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128509882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors 通过学习多处理器推测并行化中的跨线程违规来消除挤压
Marcelo H. Cintra, J. Torrellas
{"title":"Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors","authors":"Marcelo H. Cintra, J. Torrellas","doi":"10.1109/HPCA.2002.995697","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995697","url":null,"abstract":"With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114371152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
Thread-spawning schemes for speculative multithreading 推测多线程的线程生成方案
P. Marcuello, Antonio González
{"title":"Thread-spawning schemes for speculative multithreading","authors":"P. Marcuello, Antonio González","doi":"10.1109/HPCA.2002.995698","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995698","url":null,"abstract":"Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128020830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 76
CableS : thread control and memory management extensions for shared virtual memory clusters 电缆:线程控制和内存管理扩展,用于共享虚拟内存集群
P. Jamieson, A. Bilas
{"title":"CableS : thread control and memory management extensions for shared virtual memory clusters","authors":"P. Jamieson, A. Bilas","doi":"10.1109/HPCA.2002.995716","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995716","url":null,"abstract":"Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although clusters have many advantages, their applicability in emerging areas of applications has been limited. One of the main reasons for this is the fact that clusters do not provide a single system image and thus are hard to program. In this work we address this problem by providing a single-cluster image with respect to thread and memory management. We implement our system, CableS (Cluster enabled threads), on a 32-processor cluster interconnected with a low-latency, high-bandwidth system area network and conduct an early exploration of the costs involved in providing the extra functionality. We demonstrate the versatility :of Cables with a wide range of applications and show that clusters can be used to support applications that have been written for more expensive tightly-coupled systems, With very little effort on the programmer side: (a) We run legacy pthreads applications without any major modifications. (b) We use a public domain OpenMP compiler (OdinMP) to translate OpenMP programs to pthreads and execute them on our system, with no or few modifications to the translated pthreads source code. (c) We provide an implementation of the M4 macros for our pthreads system and run the SPLASH-2 applications. We also show that the overhead introduced by the extra functionality of CableS affects the parallel section of applications that have been tuned for the shared memory abstraction only in cases where the data placement is affected by operating system (WindowsNT) limitations in virtual memory mappings granularity.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117010578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Proceedings Eighth International Symposium on High Performance Computer Architecture 第八届高性能计算机体系结构国际研讨会论文集
{"title":"Proceedings Eighth International Symposium on High Performance Computer Architecture","authors":"","doi":"10.1109/HPCA.2002.995692","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995692","url":null,"abstract":"The following topics are dealt with: energy and thermal management; speculative multithreading; memory-aware scheduling; latency tolerance and caches; speculation and prediction; multiprocessor systems; pipelining and microarchitecture; high-performance computer architecture.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114134080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信