Proceedings Eighth International Symposium on High Performance Computer Architecture最新文献_第3页

Let's study whole-program cache behaviour analytically 让我们分析研究整个程序的缓存行为

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995708

X. Vera, Jingling Xue

引用次数: 83

Fine-grain priority scheduling on multi-channel memory systems 多通道存储系统的细粒度优先级调度

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995702

Zhichun Zhu, Zhao Zhang, Xiaodong Zhang

引用次数: 31

Loose loops sink chips 松环沉芯片

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995719

Eric Borch, Eric Tune, Srilatha Manne, J. Emer

引用次数: 175

Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors 通过学习多处理器推测并行化中的跨线程违规来消除挤压

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995697

Marcelo H. Cintra, J. Torrellas

{"title":"Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors","authors":"Marcelo H. Cintra, J. Torrellas","doi":"10.1109/HPCA.2002.995697","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995697","url":null,"abstract":"With speculative thread-level parallelization, codes that cannot be fully compiler-analyzed are aggressively executed in parallel. If the hardware detects a cross-thread dependence violation, it squashes offending threads and resumes execution. Unfortunately, frequent squashing cripples performance. This paper proposes a new framework of hardware mechanisms to eliminate most squashes due to data dependences in multiprocessors. The framework works by learning and predicting violations, and applying delayed-disambiguation, value prediction, and stall and release. The framework is suited for directory-based multiprocessors that track memory accesses at the system level with the coarse granularity of memory lines. Simulations of a 16-processor machine show that the framework is very effective. By adding our framework to a speculative CC-NUMA with 64-byte memory lines, we speed-up applications by an average of 4.3 times. Moreover, the resulting system is even 23% faster than a machine that tracks memory accesses at the fine granularity of words-a sophisticated system that is not compatible with mainstream cache coherence protocols.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"332 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114371152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 90

Thread-spawning schemes for speculative multithreading 推测多线程的线程生成方案

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 2002-02-02 DOI: 10.1109/HPCA.2002.995698

P. Marcuello, Antonio González

{"title":"Thread-spawning schemes for speculative multithreading","authors":"P. Marcuello, Antonio González","doi":"10.1109/HPCA.2002.995698","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995698","url":null,"abstract":"Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128020830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

CableS : thread control and memory management extensions for shared virtual memory clusters 电缆:线程控制和内存管理扩展，用于共享虚拟内存集群

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 1900-01-01 DOI: 10.1109/HPCA.2002.995716

P. Jamieson, A. Bilas

{"title":"CableS : thread control and memory management extensions for shared virtual memory clusters","authors":"P. Jamieson, A. Bilas","doi":"10.1109/HPCA.2002.995716","DOIUrl":"https://doi.org/10.1109/HPCA.2002.995716","url":null,"abstract":"Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although clusters have many advantages, their applicability in emerging areas of applications has been limited. One of the main reasons for this is the fact that clusters do not provide a single system image and thus are hard to program. In this work we address this problem by providing a single-cluster image with respect to thread and memory management. We implement our system, CableS (Cluster enabled threads), on a 32-processor cluster interconnected with a low-latency, high-bandwidth system area network and conduct an early exploration of the costs involved in providing the extra functionality. We demonstrate the versatility :of Cables with a wide range of applications and show that clusters can be used to support applications that have been written for more expensive tightly-coupled systems, With very little effort on the programmer side: (a) We run legacy pthreads applications without any major modifications. (b) We use a public domain OpenMP compiler (OdinMP) to translate OpenMP programs to pthreads and execute them on our system, with no or few modifications to the translated pthreads source code. (c) We provide an implementation of the M4 macros for our pthreads system and run the SPLASH-2 applications. We also show that the overhead introduced by the extra functionality of CableS affects the parallel section of applications that have been tuned for the shared memory abstraction only in cases where the data placement is affected by operating system (WindowsNT) limitations in virtual memory mappings granularity.","PeriodicalId":408620,"journal":{"name":"Proceedings Eighth International Symposium on High Performance Computer Architecture","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117010578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Proceedings Eighth International Symposium on High Performance Computer Architecture 第八届高性能计算机体系结构国际研讨会论文集

Proceedings Eighth International Symposium on High Performance Computer Architecture Pub Date : 1900-01-01 DOI: 10.1109/HPCA.2002.995692

引用次数: 25