10th International Symposium on High Performance Computer Architecture (HPCA'04)最新文献_第3页

Exploring Wakeup-Free Instruction Scheduling 探索无唤醒指令调度

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10014

Jie S. Hu, N. Vijaykrishnan, M. J. Irwin

{"title":"Exploring Wakeup-Free Instruction Scheduling","authors":"Jie S. Hu, N. Vijaykrishnan, M. J. Irwin","doi":"10.1109/HPCA.2004.10014","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10014","url":null,"abstract":"Design of wakeup-free issue queues is becoming desirable due to the increasing complexity associated with broadcast-based instruction wakeup. The effectiveness of most wakeup-free issue queue designs is critically based on their success in predicting the issue latency of an instruction accurately. Consequently, the goal of this paper is to explore the predictability of instruction issue latency under different design constraints and to identify the impediments to performance in such wakeup-free architectures. Our results indicate that structural problems in promoting instructions to the head of the instruction queue from where they are issued in wakeup-free architectures, the limited number of candidate instructions that can be considered for instruction issue, and the resource conflicts due to non-availability of issue ports all have a significant impact in degrading the performance of broadcast free architectures. Based on these observation, we explore an architecture that attempts to overcome the structural limitations by employing traditional selection logic and by using pre-check logic to reduce the impact of resource conflicts while still employing a wakeup-free strategy based on predicted instruction issue latencies. Finally, we improve this technique by limiting the selection logic to a small segment of the issue queue.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Synthesizing Representative I/O Workloads for TPC-H 综合TPC-H的典型I/O工作负载

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10019

Jianyong Zhang, A. Sivasubramaniam, H. Franke, N. Gautam, Yanyong Zhang, S. Nagar

引用次数: 55

Link-time path-sensitive memory redundancy elimination 链路时间路径敏感内存冗余消除

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10009

Manel Fernández, R. Espasa

{"title":"Link-time path-sensitive memory redundancy elimination","authors":"Manel Fernández, R. Espasa","doi":"10.1109/HPCA.2004.10009","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10009","url":null,"abstract":"Optimizations performed at link-time or directly applied to final program executables have received increased attention in recent years. We discuss the discovery and elimination of redundant memory operations in the context of a link-time optimizer, an optimization that we call memory redundancy elimination (MRE). Previous research showed that existing MRE techniques are mainly based on path-insensitive information, which causes many MRE opportunities to be lost. We present a new technique for eliminating redundant loads in a path-sensitive fashion, by using a novel alias analysis algorithm that is able to expose path-sensitive memory redundancies. We also extend our previous work by removing both redundant and dead stores. Our experiments show that around 75% of load and 10% of store references in a program can be considered redundant, because they are accessing memory locations that have been referenced less than 256 memory instructions away. By combining our previous optimizations for eliminating load redundancies with the new techniques developed, we show that around 18% of the loads and 8% of the stores can be detected and eliminated, which translates into a 10% reduction in execution time.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Understanding scheduling replay schemes 理解调度重放方案

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10011

I. Kim, Mikko H. Lipasti

引用次数: 69

Stream register files with indexed access 具有索引访问的流寄存器文件

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10007

N. Jayasena, M. Erez, Jung Ho Ahn, W. Dally

引用次数: 53

Using prime numbers for cache indexing to eliminate conflict misses 使用素数进行缓存索引以消除冲突缺失

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10015

Mazen Kharbutli, Keith Irwin, Yan Solihin, Jaejin Lee

{"title":"Using prime numbers for cache indexing to eliminate conflict misses","authors":"Mazen Kharbutli, Keith Irwin, Yan Solihin, Jaejin Lee","doi":"10.1109/HPCA.2004.10015","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10015","url":null,"abstract":"Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often result in performance slowdown. We present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions: prime modulo and prime displacement that are resistant to pathological behavior and yet are able to eliminate the worst case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow add operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have nonuniform cache accesses, both prime modulo and prime displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using multiple prime displacement hashing functions in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7%.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122004026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 114

Low-complexity distributed issue queue 低复杂度的分布式问题队列

10th International Symposium on High Performance Computer Architecture (HPCA'04) Pub Date : 2004-02-14 DOI: 10.1109/HPCA.2004.10013

J. Abella, Antonio González

引用次数: 21