{"title":"Optimizing applications for performance on the pentium 4 architecture","authors":"A. Mehis, R. Radhakrishnan","doi":"10.1109/WWC.2002.1226494","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226494","url":null,"abstract":"In this paper we characterize the performance impact of using advanced compiler optimizations on the Intel Pentium 4 (P4) processor. Using the Intel C++/FORTRAN compilers we show that on a variety of benchmarks, advanced compiler optimizations are required to improve performance on the P4 processor. For applications developed using advanced optimizations targeting the earlier PentiumPro through Pentium III architectures, recompilation is likely required to obtain and/or maximize performance improvements on the P4. The performance enhancing design features of the P4 although dynamic in nature, require that applications be recompiled using P4 architecture aware compilers to obtain performance improvements.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"140 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124432397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Nogueira, L. Rocha, J. Santos, P. Araújo, V. Almeida, Wagner Meira Jr
{"title":"A methodology for workload characterization of file-sharing peer-to-peer networks","authors":"D. Nogueira, L. Rocha, J. Santos, P. Araújo, V. Almeida, Wagner Meira Jr","doi":"10.1109/WWC.2002.1226500","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226500","url":null,"abstract":"The main characteristic of peer-to-peer (P2P) networks is that the hosts in the network may act as both clients and servers at the same time, being called servents. These networks have been widely adopted for sharing idle computational resources available in the Internet, improving content accessibility while reducing costs and response latency, although host availability and content coherence is not usually guaranteed. As a consequence, traditional workload characterization strategies are not suitable for analyzing and understanding these networks, motivating the design of specific strategies for their characterization. In this article we present a novel workload characterization methodology for P2P networks, which account for the main features of these networks. We validate our methodology through the characterization of the Gnutella network, through which we are able to characterize file-sharing patterns, the availability of the servents, and the search patterns, among others.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131562452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cache performance in Java virtual machines: a study of constituent phases","authors":"A. S. Rajan, Shiwen Hu, J. Rubio","doi":"10.1109/WWC.2002.1226496","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226496","url":null,"abstract":"This paper studies the level 1 cache performance of Java programs by analyzing memory reference traces of the SPECjvm98 applications executed by the Latte Java virtual machine. We study in detail Java programs' cache performance of different access types in three JVM phases, under two execution modes, using three cache configurations and two application data sets. We observe that the poor data cache performance in the JIT execution mode is caused by code installation, when the data write miss rate in the execution engine can be as high as 70%. In addition, code installation also deteriorates instruction cache performance during execution of translated code. High cache miss rate in garbage collection is mainly caused by large working set and pointer chasing of the garbage collector. A larger data cache works better on eliminating data cache read misses than write misses, and is more efficient on improving cache performance in the execution engine than in the garbage collection. As application data set increases in the JIT execution mode, instruction cache and data cache write miss rates of the execution engine decrease, while data cache read miss rate of the execution engine increases. On the other hand, impact of varying data set on cache performance is not as pronounced in the interpreted mode as in the JIT mode.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128638311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adil Adi Gheewala, Jonathan C.L. Liu, Michael P. Frank, Yen-kuang Chen, Manuel E. Bermudez
{"title":"Estimating multimedia instruction performance based on workload characterization and measurement","authors":"Adil Adi Gheewala, Jonathan C.L. Liu, Michael P. Frank, Yen-kuang Chen, Manuel E. Bermudez","doi":"10.1109/WWC.2002.1226498","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226498","url":null,"abstract":"The increasing popularity in multimedia applications provokes microprocessors to include media-enhancement instructions. In this paper, we describe a methodology to estimate performance improvement of a new set of media instructions on emerging applications based on workload characterization and measurement. Application programs are characterized into a sequential segment, a vectorizable segment, and extra data moves for utilizing the SIMD capability of new media instructions. Techniques based on benchmarking and measurement on existing systems are used to estimate the execution time of each segment. Based on the measurement results, the speedup and the additional data moves of using the new media instructions can be estimated to help processor architects and designers evaluate different design tradeoffs.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127243966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Amza, A. Chanda, A. Cox, S. Elnikety, R. Gil, K. Rajamani, W. Zwaenepoel, E. Cecchet
{"title":"Specification and implementation of dynamic Web site benchmarks","authors":"C. Amza, A. Chanda, A. Cox, S. Elnikety, R. Gil, K. Rajamani, W. Zwaenepoel, E. Cecchet","doi":"10.1109/WWC.2002.1226489","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226489","url":null,"abstract":"The absence of benchmarks for Web sites with dynamic content has been a major impediment to research in this area. We describe three benchmarks for evaluating the performance of Web sites with dynamic content. The benchmarks model three common types of dynamic content Web sites with widely varying application characteristics: an online bookstore, an auction site, and a bulletin board. For the online bookstore, we use the TPCW specification. For the auction site and the bulletin board, we provide our own specification, modeled after ebay.com and slahdot.org, respectively. For each benchmark we describe the design of the database and the interactions provided by the Web server. We have implemented these three benchmarks with a variety of methods for building dynamic-content applications, including PHP, Java servlets and EJB (Enterprise Java Beans). In all cases, we use commonly used open-source software. We also provide a client emulator that allows a dynamic content Web server to be driven with various workloads. Our implementations are available freely from our Web site for other researchers to use. These benchmarks can be used for research in dynamic Web and application server design. In this paper, we provide one example of such possible use, namely discovering the bottlenecks for applications in a particular server configuration. Other possible uses include studies of clustering and caching for dynamic content, comparison of different application implementation methods, and studying the effect of different workload characteristics on the performance of servers. With these benchmarks we hope to provide a common reference point for studies in these areas.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124889516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy impact of secure computation on a handheld device","authors":"Zhiyuan Li, Rong-Chang Xu","doi":"10.1109/WWC.2002.1226499","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226499","url":null,"abstract":"Computation offloading is an important approach to save the energy consumption while improving performance for wireless networked handheld devices. With such an approach, computational tasks are offloaded from the handheld device to a server, depending on the tradeoff between the communication cost and the computation cost. Adding security to the wireless network changes the relative cost of computation and communication. In this paper, we measure the energy consumption characteristics of multimedia applications on a handheld device, supported by computation offloading though a wireless LAN which is secured with IPsec. The measurement indicates that despite the overhead of the security mechanism, offloading remains quite effective as a method to reduce program execution time and energy consumption.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129760890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A detailed comparison of two transaction processing workloads","authors":"","doi":"10.1109/WWC.2002.1226492","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226492","url":null,"abstract":"Commercial applications such as databases and Web servers constitute the most important market segment for high-performance servers. Among these applications, online transaction processing (OLTP) workloads provide a challenging set of requirements for system designs since they often exhibit inefficient executions dominated by a large memory stall component. A number of recent studies have characterized the behavior of transaction processing workloads and proposed architectural features to improve their performance. These studies have typically used a workload based on either the TPC-B or the TPC-C benchmark, with many of them opting for the simpler TPC-B benchmark. Given that the TPC-B and TPC-C workloads exhibit dramatically different characteristics on certain architectural metrics (such as cycles-per-instruction), it becomes important to find out whether the results or conclusions of these previous studies are heavily biased due to their choice of workload. This paper presents a detailed comparison of the debit-credit (modeled after TPC-B) and order-entry (modeled after TPC-C) transaction processing workloads in the context of various architectural choices. Our experiments use the Oracle commercial database engine for running the workloads, with results generated using both full system simulations and actual runs on Alpha multiprocessors. Our results confirm that certain characteristics of these workloads, such as cycles-per-instruction (CPI) and dirty miss frequency, are indeed quite different. Nonetheless, it turns out that the overall impact of most architectural choices (e.g., out-of-order execution, on-chip integration of system modules, chip multiprocessing) are surprisingly similar for the two workloads. Furthermore, the above similarity between the two workloads is sometimes due to non-intuitive effects that would be difficult to predict without conducting the experiment with both workloads. The findings in this paper make it easier to compare results from studies that use one or the other workload. Overall, we observe that for a wide range of architectural decisions that we considered, using the simpler TPC-B workload leads to virtually the same conclusions as using the more complex TPC-C workload. Finally, we show that these same conclusions hold across two generations of the Oracle database engine.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126247727","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Rupley, M. Annavaram, John DeVale, T. Diep, B. Black
{"title":"Comparing and contrasting a commercial OLTP workload with CPU2000 on IPF","authors":"J. Rupley, M. Annavaram, John DeVale, T. Diep, B. Black","doi":"10.1109/WWC.2002.1226493","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226493","url":null,"abstract":"With the recent introduction of Itanium Processor Family (IPF) microprocessors for enterprise servers it is imperative to understand the behavior of server class applications. This paper analyzes the behavior of the Oracle Database Benchmark (ODB), an online transaction processing (OLTP) workload, and compares it with SPEC CPU2000. This study examines code mix, instruction and data supply, and value locality. The results show that while IPF's bundle constraints cause a large injection of NOPs into the code stream, IPFs register stack engine successfully reduces the number of memory operations by nearly 50%. The control-flow predictability of ODB is better than CPU2000, in spite of ODB's large active branch footprint. Due to ODB's large memory footprint, cache misses (particularly instruction cache misses) are a much more serious problem than in CPU2000.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130971262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating synthetic trace models using locality surfaces","authors":"E. S. Sorenson, J. Flanagan","doi":"10.1109/WWC.2002.1226491","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226491","url":null,"abstract":"In this paper we analyze several synthetic trace generation models using locality surfaces. The locality surfaces let us discover what elements of the real trace were accurately modeled and what features were not. None of the models examined are very good at retaining the locality of the real trace. We can see from cache simulation results that if the locality surface does not accurately reflect the locality of the real workload, the cache performance statistics will not be accurate either.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124559216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking a site with realistic workload","authors":"G. Ballocca, R. Politi, V. Russo, G. Ruffo","doi":"10.1109/WWC.2002.1226490","DOIUrl":"https://doi.org/10.1109/WWC.2002.1226490","url":null,"abstract":"The rapidly growing number of Web users and the consequent importance of capacity planning have lead to the development of Web benchmarking tools. One common criticism of this approach, is that synthetic workload produced by Web stressing tools is far from realistic. This paper deals with a benchmarking methodology based on workload characterization generated from log files. A customer behavior model graph (CBMG) was proposed by Mensace, et al., (1999) as workload characterization of an e-commerce site. We discuss how CBMG methodology has a wider field of application and how to use this model to efficiently improve a fully integrated Web stressing tool. We also evaluate the differences between our approach and other models based on different characterizations.","PeriodicalId":320576,"journal":{"name":"2002 IEEE International Workshop on Workload Characterization","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114592672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}