2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.最新文献

筛选

英文中文

Efficient profile-based evaluation of randomising set index functions for cache memories 缓存存储器随机集索引函数的高效基于配置文件的评估

2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS. Pub Date : 2001-11-04 DOI: 10.1109/ISPASS.2001.990687

H. Vandierendonck, K. D. Bosschere

{"title":"Efficient profile-based evaluation of randomising set index functions for cache memories","authors":"H. Vandierendonck, K. D. Bosschere","doi":"10.1109/ISPASS.2001.990687","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990687","url":null,"abstract":"The performance of direct mapped caches is degraded by conflict misses. It has been shown that conflict misses can be reduced by using randomising set index functions, such that repeated conflicts are avoided. However, optimising the set index function requires time consuming simulations, because the design space of randomising set index functions is very large. Therefore, we dei,eloped a profile-based technique that allows one to make a fast estimation of the miss ratio incurred by a set index function. Using this technique, one can perform a fast, initial exploration of the design space of set index functions, followed by a slower, but more accurate, analysis using simulation. The profile-based technique is based on a new representation of randomising set index functions using mill spaces. The profile-based technique consists of two phases. In the first phase, a program is profiled and in the second phase, a score is computed from the profile data and the mill space of a set index function. We show that the computed score closely reflects the miss ratio incurred by that set index function. Computing a score is a simple operation that requires no simulation time. Therefore, only one profiling run is required to estimate the miss ratios for a wide range of set index functions.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132149797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

MASE: a novel infrastructure for detailed microarchitectural modeling MASE:用于详细微架构建模的新颖基础设施

2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS. Pub Date : 1900-01-01 DOI: 10.1109/ISPASS.2001.990668

E. Larson, Saugata Chatterjee, T. Austin

{"title":"MASE: a novel infrastructure for detailed microarchitectural modeling","authors":"E. Larson, Saugata Chatterjee, T. Austin","doi":"10.1109/ISPASS.2001.990668","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990668","url":null,"abstract":"MASE (Micro Architectural Simulation Environment) is a novel infrastructure that provides a flexible and capable environment to model modern microarchitectures. Many popular simulators, such as SimpleScalar, are predominately trace-based where the performance simulator is driven by a trace of instructions read from a file or generated on-the-fly by a functional simulator. Trace-driven simulators are well-suited for oracle studies and provide a clean division between performance modeling and functional emulation. A major problem with this approach, however, is that it does not accurately model timing dependent computations, an increasing trend in microarchitecture designs such as those found in multiprocessor systems. MASE implements a micro-functional performance model that combines timing and functional components into a single core. In addition, MASE incorporates a trace-driven functional component used to implement oracle studies and check the results of instructions as they commit. The check feature reduces the burden of correctness on the micro-functional core and also serves as a powerful debugging aid. MASE also implements a callback scheduling interface to support resources with nondeterministic latencies such as those found in highly concurrent memory systems. MASE was built on top of the current version of SimpleScalar. Analyses show that the performance statistics are comparable without a significant increase in simulation time.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127805434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 88

An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors 集群处理器指令分布算法可扩展性的实证研究

2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS. Pub Date : 1900-01-01 DOI: 10.1109/ISPASS.2001.990696

Aneesh Aggarwal, M. Franklin

{"title":"An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors","authors":"Aneesh Aggarwal, M. Franklin","doi":"10.1109/ISPASS.2001.990696","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990696","url":null,"abstract":"In the sub-micron technology era, wire delays are becoming much more important than gate delays, making it particularly attractive to go for decentralized processors. A number of algorithms have already been proposed for distributing instructions among multiple clusters. In this paper we qualitatively and quantitatively analyze the effect of various hardware parameters on the scalability of different instruction distribution algorithms. Using a set of realistic system parameters, we examine performance differences resulting from different distribution algorithms as well as from specific implementation issues such as the type of interconnect, the fetch size, the cluster issue width, and the cluster window size. Our studies have found that those distribution algorithms that perform relatively better with 4 or fewer clusters are generally not the best ones for a larger number of clusters. Also, the relative performance and scalability of the algorithms are sensitive to different hardware parameters. We also found that, among the existing algorithms, there is no single algorithm that works uniformly best across all hardware configurations. This motivates the need to develop alternate interconnects and instruction distribution algorithms.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127066387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 36

首页上一页