{"title":"Automatic memory hierarchy characterization","authors":"Clark L. Coleman, J. Davidson","doi":"10.1109/ISPASS.2001.990684","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990684","url":null,"abstract":"As the gap between memory speed and processor speed grows, program transformations to improve the peflormance of the memory system have become increasingly important. To understand and optimize memory performance, researchers and practitioners in performance analysis and compiler design require a detailed understanding of the memory hierarchy of the target computer system. Unfortunately, accurate information about the memory hierarchy is not easy to obtain. Vendor microprocessor documentation is ofen incomplete, vague, or worse, erroneous in its description of important on-chip memory parameters. Furthermore, today S computer systems contain complex, multi-level memory systems where the processor is but one component of the memory system. The accuracy of the documentation on the complete memory system is also lacking. This paper describes the implementation of a portable program that automatically determines all of a computer system’s important memory hierarchy parameters. Automatic determination of memory hierarchy parameters is shown to be superior to reliance on vendor data. The robustness and portability of the approach is demonstrated by determining and validating the memory hierarchy parameters for a number of different computer systems, using several of the emerging performance counter application programming inte$aces.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126454967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Statistical usage testing applied to mobile network verification","authors":"A. Ost, Dorien van Logchem","doi":"10.1109/ISPASS.2001.990694","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990694","url":null,"abstract":"In this paper; we present the experiences that have been made with a new method for testing sofrware systems, statistical usage testing (StlJT). StUT relies on the creation and execution of real traffic models towards the system under test in order to avoid the manual development of test cases. In contrast to several studies that have been performed so far; our approach is based on a mature test automation platform. We describe our platform and present an approach based on stochastic Petri nets for the formulation of usage models. We then report on the application of the environment towards mobile network ver$cation and provide an analysis of the results in terms of costs and benefits of StlJT.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133465663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An evaluation of search tree techniques in the presence of caches","authors":"Costin Iancu, A. Acharya","doi":"10.1109/ISPASS.2001.990682","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990682","url":null,"abstract":"Two techniques underlie the design of commonly used search-tree algorithms: move commonly accessed items closer to the root (move-to-front) and keep the tree balanced (keep-it-balanced). The move-to-front technique tries to improve the performance for the common case by reducing the number of operations required to retrieve heavily accessed items. The keep-it-balanced technique tries to improve worst-case performance by reducing the maximum number of operations required to look up an item in the tree. In this paper, we evaluate these techniques in the presence of a cache hierarchy. As representatives of move-to-front algorithms we use splay trees. As representatives of keep-it-balanced algorithms we use B-Trees and B*-Trees. In addition to classic versions of these techniques, we evaluate variants that have been optimized for a cache hierarchy. To drive our evaluation, we use a suite of synthetic datasets. These data-sets vary primarily in the degree of locality found in the request stream and the operation mix. The major result of this paper is a qualitative performance difference analysis of the two classes of algorithms based on the expected data input characteristics. Based on this analysis, we can predict which algorithm performs faster for a large class of input data-sets. We also present guidelines for choosing implementation parameters in the presence of a cache hierarchy for all","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127119275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geist: a generator for e-commerce & internet server traffic","authors":"K. Kant, V. Tewari, R. Iyer","doi":"10.1109/ISPASS.2001.990676","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990676","url":null,"abstract":"This paper describes Geist, a traffic generator for stress testing of web servers. The generator provides a large number of dialable parameters that allow traffic characteristics to range from simple static web-page browsing to the transactional traffic seen by e-commerce front end servers. Unlike other traffic generators, our generator concentrates on the characteristics of the aggregate traffic arriving at the server, which allows for better control of the scaling properties of the traffic and a more scalable generation. In this paper, we describe the traffic characterization and generation process that Geist is based on. We also present the performance of our current implementation, instances of its usage and directions for future work.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132374434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The effects of context switching on branch predictor performance","authors":"M. Co, K. Skadron","doi":"10.1109/ISPASS.2001.990679","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990679","url":null,"abstract":"This paper shows that contezt switching is not a significant factor to be considered when performing general branch prediction studies. Branch prediction allows for speculative ezecution by increasing available instruction level parallelism (ILP) and hiding the time required to resolve branch conditions. Accurate simulation of branch prediction is important because bmnch prediction strongly influences the behavior of processor structures. For this study, a timesharing framework was developed by modifying SimpleScalar 's branch predictor simulator. A thorough characterization of the effects of branch predictor configumtion, branch predictor area, and time slice length is provided. As further verification, branch predictor performance with and without flushing the predictor structures is compared. Ezperiments show that operating system wntezt switches have little effect on branch prediction rate when using time slices representative of today's operating systems. Our findings show that this results from the fact that time slices are much larger than the training time required by the bmnch predictor structures. For all predictor configurations tested, the predictors train in under l28K instructions with o r without flushing the branch predictor structures.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117024368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Understanding control flow transfer and its predictability in java processing","authors":"Tao Li, L. John","doi":"10.1109/ISPASS.2001.990678","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990678","url":null,"abstract":"An in-depth look and understanding of control flow transfer arid its predictability can guide architects to adapt control flow prediction hardware in Java processing or finely tune the performance of JVM sojhare.on general purpose machines. To our knowledge, this paper provides the first insight of branch behavior on a standard Java Virtual Machine with real workloads. Employing a complete system simulation environment, we profile brunch execution characteristics and quantify the performance of a wide range of prediction schemes on both user and kernel code. The impact of different JVM styles (JIT compiler and interpreter) on branch behavior is also studied. We firid that: (I) Kernel branches constitute a significant portion of total branch execution in Java processing; (2) Kernel and user code favor different prediction mechanisms; (3) Java processing exercises fairly large number of branch sites and large control flow footprint compared with the execution of benchmarks such us SPEClnt95; (4) A major part of the dynamic indirect branches are multiple target (polymorphic) brunches. Target addresses of indirect branches, especially those in interpreting mode are highly interleaved and cause high BTB tiiisprediction.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"58 S2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114093684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early design phase power/performance modeling through statistical simulation","authors":"L. Eeckhout, K. D. Bosschere","doi":"10.1109/ISPASS.2001.990669","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990669","url":null,"abstract":"Microprocessor design time and effort are getting impractical due to the huge number of simulations that need to be done to evaluate various processor configurations for various workloads. An early design stage methodology could be useful to efficiently cull huge design spaces to identify regions of interest to be further explored using more accurate simulations. In such an early design stage methodology, power consumption should be considered besides performance, since power consumption is becoming a key design issue for midrange and high-end microprocessor designs. In this paper, we propose to use statistical simulation as an early design stage methodology that considers both performance and power. We evaluate the applicability and the accuracy of this methodology and we show that statistical simulation is indeed capable of identifying a region of energy-efficient architectures. In addition, we demonstrate that this methodology can be used to explore workload design spaces in terms of power/performance by varying program characteristics that are hard to vary using real programs.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133467222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using program and user information to improve file prediction performance","authors":"Tsozen Yeh, D. Long, S. Brandt","doi":"10.1109/ISPASS.2001.990685","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990685","url":null,"abstract":"Correct prediction of file accesses can improve system pe formance by mitigating the relative speed difference between CPU and disks. This paper discusses Program-based Last Successor (PLS) and presents Program- and Userbased Lust Successor (PULS), file prediction algorithms that utilize information about the program and user that access the jles. Our simulation results show that PLS makes 21% fewer incorrect predictions and PULS makes 24% fewer incorrect predictions than last-successor with roughly the same number of correct predictions that lastsuccessor makes. The cache space wasted on incorrectpredictions can be reduced accordingly. We also show that a cache using the Least Recently Used (LRU) caching algorithm can perform better when the PULS is applied. In some cases, a cache using LRU and either PLS or PULS performs better than a cache up to 40 times larger using LRU alone.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115304078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"About the sensitivity of the HLRC-DU protocol on diff and page sizes","authors":"S. Petit, J. Sahuquillo, A. Pont","doi":"10.1109/ISPASS.2001.990675","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990675","url":null,"abstract":"Recent research on software distributed shared memory systems has focused on consistency protocols for improving performance. Home Lazy Release Consistency (HLRC) protocols have been widely adopted due to their performance advantages. Usually, these protocols invalidate pages through write notices. Variants of these protocols propose some criterion to update data of the corresponding pages instead of invalidating. In a previous paper, we proposed the HLRC-DU protocol, which is an improved version of the HLRC protocol. The HLRC-DU embeds update information in those write notices whose corresponding diff size is less than a given threshold, invalidating the remainder. The threshold trades off network bandwidth with update perfonnance. In this paper, we study the HLRC-DUprotocol’s sensitivity to page size and the threshold size selection. Our results show that while the page size slightly impacts performance, that our protocols are highly sensitive to the threshold value.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128329553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating internal memory fragmentation for java programs under the binary buddy policy","authors":"Therapon Skotiniotis, J. M. Chang","doi":"10.1109/ISPASS.2001.990681","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990681","url":null,"abstract":"Dynamic memory management has been an important part of a large class of computer programs and with the recent popularity of Object Oriented programming languages, more specifically Java, high performance dynamic memory management algorithms continue to be of great importance. In this paper, an analysis of Java programs, provided by the SPECjvm98 benchmark suite, and their behavior, as this relates to fragmentation, is performed. Based on this analysis, a new model is proposed which allows the estimation of the total internal fragmentation that Java systems will incur prior to the programs execution. The proposed model can also accommodate any variation of segregated lists implementation. A comparison with a previously introduced fragmentation model and with actual fragmentation values, is performed. The idea of a test-bed application that will use the proposed model to provide to programmers/developers the ability to know, prior to a programs execution, the fragmentation and memory utilization of their programs, is also introduced.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122255883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}