M. Conte, A. Trick, J. Gyllenhaal, Wen-mei W. Hwu, Wen-mei W. Hwu
{"title":"A study of code reuse and sharing characteristics of Java applications","authors":"M. Conte, A. Trick, J. Gyllenhaal, Wen-mei W. Hwu, Wen-mei W. Hwu","doi":"10.1109/WWC.1998.809356","DOIUrl":"https://doi.org/10.1109/WWC.1998.809356","url":null,"abstract":"Presents a detailed characterization of Java application and applet workloads in terms of reuse and sharing of Java code at the program, class and method level. In order to expose more sharing opportunities, techniques for detecting code equivalence (even in the presence of minor code changes or constant pool index differences) are also proposed and examined. The analyzed application workload consists of the recently released SPECjvm98 benchmarks, and the applet workload is derived from three extensive searches of the Internet between May 1997 and May 1998 using an enhanced Web crawler. Analysis of these workloads reveals several new code sharing and optimization opportunities.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132411071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the use of trace sampling for architectural studies of desktop applications","authors":"P. Crowley, J. Baer","doi":"10.1109/WWC.1998.809355","DOIUrl":"https://doi.org/10.1109/WWC.1998.809355","url":null,"abstract":"Examines the feasibility of performing architectural studies with trace sampling for a suite of desktop application traces on Windows NT. This paper makes three contributions: we compare the accuracy of several sampling techniques to determine cache miss rates for these workloads, we present victim cache and branch prediction architecture studies that demonstrate that sampling can be used to drive such studies, and we show how sampling may be used to accurately and efficiently derive the parameters for A. Agarwal et al.'s (1988) analytical cache model. Of the sampling techniques used for the cache miss ratio determinations, the stitch technique, which assumes that the state of the cache at the beginning of a sample is the same as the state at the end of the previous sample, narrowly outperforms the more complex INITMR technique of D.A. Wood et al. (1991) for these workloads. These two techniques are more accurate than the others and are reliable for caches up to 64 KB in size.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114336337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workload characterization: motivation, goals and methodology","authors":"L. John, P. Vasudevan, J. Sabarinathan","doi":"10.1109/WWC.1998.809354","DOIUrl":"https://doi.org/10.1109/WWC.1998.809354","url":null,"abstract":"Understanding the characteristics of workloads is extremely important in the design of efficient computer architectures. Accurate characterization of workload behavior leads to the design of improved architectures. The characterization of applications allows one to tune the processor micro-architecture, memory hierarchy and system architecture to suit particular features in programs. Workload characterization also has a significant impact on performance evaluation. Understanding the nature of the workload and its intrinsic features can help to interpret performance measurements and simulation results. Identifying and characterizing the intrinsic properties of an application in terms of its memory access behavior, locality, control flow behavior, instruction-level parallelism, etc. can eventually lead to a program behavior model, which can be used in conjunction with a processor model to do analytical performance modeling of computer systems. In this paper, we describe the objectives of workload characterization and emphasize the importance of obtaining architecture-independent metrics for workloads. A study of memory reference locality using some generic metrics is presented as an example.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128642473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Instruction-level characterization of scientific computing applications using hardware performance counters","authors":"Yong Luo, K. Cameron","doi":"10.1109/WWC.1998.809368","DOIUrl":"https://doi.org/10.1109/WWC.1998.809368","url":null,"abstract":"The paper provides characterization methods based on empirical performance counter measurements. In particular, we provide an instruction-level characterization derived empirically in an effort to demonstrate how architectural limitations in underlying hardware will affect the performance of existing codes. Preliminary results provide promise in code characterization, and empirical/analytical modeling. These include the ability to quantify outstanding miss utilization and stall time attributable to architectural limitations in the CPU and the memory hierarchy. This work further promises insight into quantifying bounds for CPI/sub 0/ or the ideal CPI with infinite, perfect L1 cache. In general, if we can characterize workloads using parameters that are independent of architecture, such as this work, then we can more appropriately compare different architectures in an effort to direct processor/code development.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121411179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memory access pattern analysis","authors":"M. Brown, R. Jenevein, N. Ullah","doi":"10.1109/WWC.1998.809366","DOIUrl":"https://doi.org/10.1109/WWC.1998.809366","url":null,"abstract":"A methodology for analyzing memory behavior has been developed for the purpose of evaluating memory system design. MPAT, a memory pattern analysis tool, was used to profile memory transactions of dynamic instruction traces. The paper first describes the memory model and metrics gathered by MPAT. Then the metrics are evaluated in order to determine what hardware and software changes should be made to improve memory system performance.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131553659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parameter value characterization of Windows NT-based applications","authors":"J. Kalamatianos, D. Kaeli, R. Chaiken","doi":"10.1109/WWC.1998.809370","DOIUrl":"https://doi.org/10.1109/WWC.1998.809370","url":null,"abstract":"Compiler optimizations such as code specialization and partial evaluation can be used to effectively exploit identifiable invariance of variable values. To identify the invariant variables that the compiler misses at compile time, value profiling can provide valuable information. We focus on the invariance of procedure parameters for a set of desktop applications run on MS Windows NT 4.0. Most of those applications are non-scientific and execute interactively through a rich GUI. Due to the dynamic nature of this workload, one would expect that parameter values would exhibit an unpredictable behavior. Our work attempts to address this question by measuring the invariance and temporal locality of parameter values. We also measure she invariance of parameter values for four benchmarks from the SPECINT95 suite for comparison.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130329997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance and memory-access characterization of data mining applications","authors":"J. P. Bradford, J. Fortes","doi":"10.1109/WWC.1998.809358","DOIUrl":"https://doi.org/10.1109/WWC.1998.809358","url":null,"abstract":"Characterizes the performance and memory-access behavior of a decision tree induction program, a previously unstudied application used in data mining and knowledge discovery in databases. Performance is studied via RSIM, an execution-driven simulator, for three uniprocessor models that exploit instruction-level parallelism to varying degrees. Several properties of the program are noted. Out-of-order dispatch and multiple issue provide a significant performance advantage: 50%-250% improvement in inter-processor communication (IPC) for out-of-order dispatch vs. in-order dispatch, and 5%-120% improvement in IPC for four-way issue vs. single issue. Multiple issue provides a greater performance improvement for larger L2 cache sizes, when the program is limited by CPU performance; out-of-order dispatch provides a greater performance improvement for smaller L2 cache sizes. The program has a very small instruction footprint: for an 8-kB L1 instruction cache, the instruction miss rate is below 0.1%. A small (8 kB) L1 data cache is sufficient to capture most of the locality of the data references, resulting in L1 miss rates between 10%-20%. Increasing the size of the L2 data cache does not significantly improve performance until a significant fraction (over 1/4) of the data set fits into the L2 cache. Lastly, a procedure is developed for scaling the cache sizes when using scaled-down data sets, allowing the results for smaller data sets to be used to predict the performance of full-sized data sets.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121943570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Characterizing response time of WWW caching proxy servers","authors":"C. Murta, Virgílio A. F. Almeida","doi":"10.1109/WWC.1998.809360","DOIUrl":"https://doi.org/10.1109/WWC.1998.809360","url":null,"abstract":"Caching proxies have an important role in the infrastructure of the World Wide Web (WWW). They save network traffic and reduce Web latency. While they have been largely deployed in the WWW, little is known about Web proxy behavior, and in particular about international proxies. This paper presents an analysis of caching proxy response times, based on logs from real proxies located in the USA, Europe and South America. We found that high variability is an invariant in caching response times across log data of different proxies. We show that the high variability can be explained through a combination of factors, such as the high variability in the file sizes and bandwidth of the client links to the caching proxies. Finally, we discuss the implications of high variability in the proxy behavior on performance characterization and modeling.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134223909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"I/O workload characteristics of modern servers","authors":"W. T. Boyd, R. Recio","doi":"10.1109/WWC.1998.809364","DOIUrl":"https://doi.org/10.1109/WWC.1998.809364","url":null,"abstract":"The design and development of future I/O subsystems needs to keep pace with the rapid rate of improvement in microprocessor technology and changes in system structure. In order to analyze the potential bottlenecks of I/O subsystems, we must first identify and characterize the various workloads that will run on these future systems. This paper has two major goals. The first is to identify and analyze the application environments that are presently being implemented throughout the computing industry. The second goal is to identify and summarize the I/O subsystem characteristics of various present-day and future workloads that typify these application environments.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124466139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generation of 3D graphics workload for system performance analysis","authors":"A. Poursepanj, D. Christie","doi":"10.1109/WWC.1998.809357","DOIUrl":"https://doi.org/10.1109/WWC.1998.809357","url":null,"abstract":"Generation of representative workloads for system performance models has been a challenge for PC system architects who are using trace-driven models. Unlike processor performance models that typically only use a single CPU instruction trace, system models in most cases require traces of CPU, Advanced Graphics Port (AGP), PCI and other bus mastering devices that can access memory. A common approach is to collect bus traces with a logic analyzer. Although this allows the generation of realistic traces, typical analyzer buffer sizes seriously limit the length of contiguous traces. Another problem is that traces collected in a specific system configuration may not be representative of other systems, especially future systems with different timings and/or bus protocols. This paper presents an overview of an approach that can be used to generate long bus traces for performance model stimulus. We describe methods for the characterization of system behavior and for the generation of accurate synthetic graphics traces based on real traces, and give examples of correlated CPU and AGP traces that are synthetic but reflect the characteristics of real CPU/AGP traces.","PeriodicalId":190931,"journal":{"name":"Workload Characterization: Methodology and Case Studies. Based on the First Workshop on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1998-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129108900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}