2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.最新文献

筛选
英文 中文
Efficient profile-based evaluation of randomising set index functions for cache memories 缓存存储器随机集索引函数的高效基于配置文件的评估
H. Vandierendonck, K. D. Bosschere
{"title":"Efficient profile-based evaluation of randomising set index functions for cache memories","authors":"H. Vandierendonck, K. D. Bosschere","doi":"10.1109/ISPASS.2001.990687","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990687","url":null,"abstract":"The performance of direct mapped caches is degraded by conflict misses. It has been shown that conflict misses can be reduced by using randomising set index functions, such that repeated conflicts are avoided. However, optimising the set index function requires time consuming simulations, because the design space of randomising set index functions is very large. Therefore, we dei,eloped a profile-based technique that allows one to make a fast estimation of the miss ratio incurred by a set index function. Using this technique, one can perform a fast, initial exploration of the design space of set index functions, followed by a slower, but more accurate, analysis using simulation. The profile-based technique is based on a new representation of randomising set index functions using mill spaces. The profile-based technique consists of two phases. In the first phase, a program is profiled and in the second phase, a score is computed from the profile data and the mill space of a set index function. We show that the computed score closely reflects the miss ratio incurred by that set index function. Computing a score is a simple operation that requires no simulation time. Therefore, only one profiling run is required to estimate the miss ratios for a wide range of set index functions.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132149797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
MASE: a novel infrastructure for detailed microarchitectural modeling MASE:用于详细微架构建模的新颖基础设施
E. Larson, Saugata Chatterjee, T. Austin
{"title":"MASE: a novel infrastructure for detailed microarchitectural modeling","authors":"E. Larson, Saugata Chatterjee, T. Austin","doi":"10.1109/ISPASS.2001.990668","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990668","url":null,"abstract":"MASE (Micro Architectural Simulation Environment) is a novel infrastructure that provides a flexible and capable environment to model modern microarchitectures. Many popular simulators, such as SimpleScalar, are predominately trace-based where the performance simulator is driven by a trace of instructions read from a file or generated on-the-fly by a functional simulator. Trace-driven simulators are well-suited for oracle studies and provide a clean division between performance modeling and functional emulation. A major problem with this approach, however, is that it does not accurately model timing dependent computations, an increasing trend in microarchitecture designs such as those found in multiprocessor systems. MASE implements a micro-functional performance model that combines timing and functional components into a single core. In addition, MASE incorporates a trace-driven functional component used to implement oracle studies and check the results of instructions as they commit. The check feature reduces the burden of correctness on the micro-functional core and also serves as a powerful debugging aid. MASE also implements a callback scheduling interface to support resources with nondeterministic latencies such as those found in highly concurrent memory systems. MASE was built on top of the current version of SimpleScalar. Analyses show that the performance statistics are comparable without a significant increase in simulation time.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127805434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors 集群处理器指令分布算法可扩展性的实证研究
Aneesh Aggarwal, M. Franklin
{"title":"An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors","authors":"Aneesh Aggarwal, M. Franklin","doi":"10.1109/ISPASS.2001.990696","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990696","url":null,"abstract":"In the sub-micron technology era, wire delays are becoming much more important than gate delays, making it particularly attractive to go for decentralized processors. A number of algorithms have already been proposed for distributing instructions among multiple clusters. In this paper we qualitatively and quantitatively analyze the effect of various hardware parameters on the scalability of different instruction distribution algorithms. Using a set of realistic system parameters, we examine performance differences resulting from different distribution algorithms as well as from specific implementation issues such as the type of interconnect, the fetch size, the cluster issue width, and the cluster window size. Our studies have found that those distribution algorithms that perform relatively better with 4 or fewer clusters are generally not the best ones for a larger number of clusters. Also, the relative performance and scalability of the algorithms are sensitive to different hardware parameters. We also found that, among the existing algorithms, there is no single algorithm that works uniformly best across all hardware configurations. This motivates the need to develop alternate interconnects and instruction distribution algorithms.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127066387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信