10th International Symposium on High Performance Computer Architecture (HPCA'04)最新文献

筛选
英文 中文
Exploring Wakeup-Free Instruction Scheduling 探索无唤醒指令调度
Jie S. Hu, N. Vijaykrishnan, M. J. Irwin
{"title":"Exploring Wakeup-Free Instruction Scheduling","authors":"Jie S. Hu, N. Vijaykrishnan, M. J. Irwin","doi":"10.1109/HPCA.2004.10014","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10014","url":null,"abstract":"Design of wakeup-free issue queues is becoming desirable due to the increasing complexity associated with broadcast-based instruction wakeup. The effectiveness of most wakeup-free issue queue designs is critically based on their success in predicting the issue latency of an instruction accurately. Consequently, the goal of this paper is to explore the predictability of instruction issue latency under different design constraints and to identify the impediments to performance in such wakeup-free architectures. Our results indicate that structural problems in promoting instructions to the head of the instruction queue from where they are issued in wakeup-free architectures, the limited number of candidate instructions that can be considered for instruction issue, and the resource conflicts due to non-availability of issue ports all have a significant impact in degrading the performance of broadcast free architectures. Based on these observation, we explore an architecture that attempts to overcome the structural limitations by employing traditional selection logic and by using pre-check logic to reduce the impact of resource conflicts while still employing a wakeup-free strategy based on predicted instruction issue latencies. Finally, we improve this technique by limiting the selection logic to a small segment of the issue queue.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128249154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Synthesizing Representative I/O Workloads for TPC-H 综合TPC-H的典型I/O工作负载
Jianyong Zhang, A. Sivasubramaniam, H. Franke, N. Gautam, Yanyong Zhang, S. Nagar
{"title":"Synthesizing Representative I/O Workloads for TPC-H","authors":"Jianyong Zhang, A. Sivasubramaniam, H. Franke, N. Gautam, Yanyong Zhang, S. Nagar","doi":"10.1109/HPCA.2004.10019","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10019","url":null,"abstract":"Synthesizing I/O requests that can accurately capture workload behavior is extremely valuable for the design, implementation and optimization of disk subsystems. This paper presents a synthetic workload generator for TPC-H, an important decision-support commercial workload, by completely characterizing the arrival and access patterns of its queries. We present a novel approach for parameterizing the behavior of inter-mingling streams of sequential requests, and exploit correlations between multiple attributes of these requests, to generate disk block-level traces that are shown to accurately mimic the behavior of a real trace in terms of response time characteristics for each TPC-H query.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130119132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
Link-time path-sensitive memory redundancy elimination 链路时间路径敏感内存冗余消除
Manel Fernández, R. Espasa
{"title":"Link-time path-sensitive memory redundancy elimination","authors":"Manel Fernández, R. Espasa","doi":"10.1109/HPCA.2004.10009","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10009","url":null,"abstract":"Optimizations performed at link-time or directly applied to final program executables have received increased attention in recent years. We discuss the discovery and elimination of redundant memory operations in the context of a link-time optimizer, an optimization that we call memory redundancy elimination (MRE). Previous research showed that existing MRE techniques are mainly based on path-insensitive information, which causes many MRE opportunities to be lost. We present a new technique for eliminating redundant loads in a path-sensitive fashion, by using a novel alias analysis algorithm that is able to expose path-sensitive memory redundancies. We also extend our previous work by removing both redundant and dead stores. Our experiments show that around 75% of load and 10% of store references in a program can be considered redundant, because they are accessing memory locations that have been referenced less than 256 memory instructions away. By combining our previous optimizations for eliminating load redundancies with the new techniques developed, we show that around 18% of the loads and 8% of the stores can be detected and eliminated, which translates into a 10% reduction in execution time.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Understanding scheduling replay schemes 理解调度重放方案
I. Kim, Mikko H. Lipasti
{"title":"Understanding scheduling replay schemes","authors":"I. Kim, Mikko H. Lipasti","doi":"10.1109/HPCA.2004.10011","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10011","url":null,"abstract":"Modern microprocessors adopt speculative scheduling techniques where instructions are scheduled several clock cycles before they actually execute. Due to this scheduling delay, scheduling misses should be recovered across the multiple levels of dependence chains in order to prevent further unnecessary execution. We explore the design space of various scheduling replay schemes that prevent the propagation of scheduling misses, and find that current and proposed replay schemes do not scale well and require instructions to execute in correct data dependence order, since they track dependences among instructions within the instruction window as a part of the scheduling or execution process. We propose token-based selective replay that moves the dependence information propagation loop out of the scheduler, enabling lower complexity in the scheduling logic and support for data-speculation techniques at the expense of marginal IPC degradation compared to an ideal selective replay scheme.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124177449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Stream register files with indexed access 具有索引访问的流寄存器文件
N. Jayasena, M. Erez, Jung Ho Ahn, W. Dally
{"title":"Stream register files with indexed access","authors":"N. Jayasena, M. Erez, Jung Ho Ahn, W. Dally","doi":"10.1109/HPCA.2004.10007","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10007","url":null,"abstract":"Many current programmable architecture designed to exploit data parallelism require computation to be structured to operate on sequentially accessed vectors or streams of data. Applications with less regular data access patterns perform sub-optimally on such architectures. We present a register file for streams (SRF) that allows arbitrary, indexed accesses. Compared to sequential SRF access, indexed access captures more temporal locality, reduces data replication in the SRF, and provides efficient support for certain types of complex access patterns. Our simulations show that indexed SRF access provides speedups of 1.03x to 4.1x and memory bandwidth reductions of up to 95% over sequential SRF access for a set of benchmarks representative of data-parallel applications with irregular accesses. Indexed SRF access also provides greater speedups than caches for a number of application classes despite significantly lower hardware costs. The area overhead of our indexed SRF implementation is 11%-22% over a sequentially accessed SRF, which corresponds to a modest 1.5%-3% increase in the total die area of a typical stream processor.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"451 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133271077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Using prime numbers for cache indexing to eliminate conflict misses 使用素数进行缓存索引以消除冲突缺失
Mazen Kharbutli, Keith Irwin, Yan Solihin, Jaejin Lee
{"title":"Using prime numbers for cache indexing to eliminate conflict misses","authors":"Mazen Kharbutli, Keith Irwin, Yan Solihin, Jaejin Lee","doi":"10.1109/HPCA.2004.10015","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10015","url":null,"abstract":"Using alternative cache indexing/hashing functions is a popular technique to reduce conflict misses by achieving a more uniform cache access distribution across the sets in the cache. Although various alternative hashing functions have been demonstrated to eliminate the worst case conflict behavior, no study has really analyzed the pathological behavior of such hashing functions that often result in performance slowdown. We present an in-depth analysis of the pathological behavior of cache hashing functions. Based on the analysis, we propose two new hashing functions: prime modulo and prime displacement that are resistant to pathological behavior and yet are able to eliminate the worst case conflict behavior in the L2 cache. We show that these two schemes can be implemented in fast hardware using a set of narrow add operations, with negligible fragmentation in the L2 cache. We evaluate the schemes on 23 memory intensive applications. For applications that have nonuniform cache accesses, both prime modulo and prime displacement hashing achieve an average speedup of 1.27 compared to traditional hashing, without slowing down any of the 23 benchmarks. We also evaluate using multiple prime displacement hashing functions in conjunction with a skewed associative L2 cache. The skewed associative cache achieves a better average speedup at the cost of some pathological behavior that slows down four applications by up to 7%.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122004026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 114
Low-complexity distributed issue queue 低复杂度的分布式问题队列
J. Abella, Antonio González
{"title":"Low-complexity distributed issue queue","authors":"J. Abella, Antonio González","doi":"10.1109/HPCA.2004.10013","DOIUrl":"https://doi.org/10.1109/HPCA.2004.10013","url":null,"abstract":"As technology evolves, power density significantly increases and cooling systems become more complex and expensive. The issue logic is one of the processor hotspots and, at the same time, its latency is crucial for the processor performance. We present a low-complexity FP issue logic (MB/spl I.bar/distr) that achieves high performance with small energy requirements. The MB/spl I.bar/distr scheme is based on classifying instructions and dispatching them into a set of queues depending on their data dependences. These instructions are selected for issuing based on an estimation of when their operands will be available, so the conventional wakeup activity is not required. Additionally, the functional units are distributed across the different queues. The energy required by the proposed scheme is substantially lower than that required by a conventional issue design, even if the latter has the ability of waking-up only unready operands. MB/spl I.bar/distr scheme reduces the energy-delay product by 35% and the energy-delay product by 18% with respect to a state-of-the-art approach.","PeriodicalId":145009,"journal":{"name":"10th International Symposium on High Performance Computer Architecture (HPCA'04)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130756159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信