{"title":"Performance characterization experience of multi-tier e-business systems using queuing operational analysis","authors":"Deep K. Buch, Vladimir M. Pentkovski","doi":"10.1109/ISPASS.2001.990677","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990677","url":null,"abstract":"This paper describes the authors’ experience of performance characterization of typical modern multitier e-business systems. The experiments are performed for both Microsoft * DNA and Java * server environments using commonly available measurement tools. Queuing operational analysis was applied in order to characterize system performance, scalability and bottlenecks. Close correlation of predictions with real measurements was observed. The technique allows quantitative comparison of different e-business middleware. Further, using this quantitative approach helps to provide insights into system behavior.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"123 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133385411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An evaluation of the POSIX trace standard implemented in RT-linux","authors":"A. Terrasa, Ignacio Pachés, A. García-Fornes","doi":"10.1109/ISPASS.2001.990672","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990672","url":null,"abstract":"As real-time applications become more complex, the availability of event tracing mechanisms becomes more important in order to perform run-time monitoring. The recent approved standard POSIX 1003.Iq defines a common application interface for trace management. In this paper we present our experience with the implementation of a subset of the approved tracing standard that conforms to the POSIX Minimal Real-Time System Profile. The implementation has been done in the RT-Linux operating system, in order to obtain a real-time kernel with standard tracing services that conforms to the Minimal ProJile.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116041370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Locality-aware predictive scheduling of network processors","authors":"T. Wolf, M. Franklin","doi":"10.1109/ISPASS.2001.990693","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990693","url":null,"abstract":"Demands for flexible processing have moved generalpurpose processing into the data path of networks. Processor schedulers have a great impact on the pegormance of these real-time systems. We present measurements that show that the workload of a network processor is highly regular and predictable. Processing time predictions, based on these measurements, can be used in scheduling together with information about locality in the instruction stream to signijicantly improve throughput performance. We propose two scheduling schemes, Locality-Aware and Locality-Aware Predictive, that try to avoid cold caches when scheduling packets for processors. Simulations of the schedulers using packet processing times obtained from an operational network processor show the tradeoffs between the algorithms and their pe@ormance improvements over First-Come-FirstServe scheduling.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121835878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Qian, W. Srisa-an, Therapon Skotiniotis, J. M. Chang
{"title":"Cycle accurate thread timer for linux environment","authors":"Yang Qian, W. Srisa-an, Therapon Skotiniotis, J. M. Chang","doi":"10.1109/ISPASS.2001.990674","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990674","url":null,"abstract":"Due to the increasing popularity of java in clienthemer environments, most of today's server applications are multithreaded. Thus, research focusing on the performance analysis of multi-threaded environments has become increasingly important. Since per-thread information can be crucial in such analysis, measuring tools are needed to provide perthread information that may include cycle-based timers and filters to eliminate tracing overhead. In this papel; a Cycle Accurate Thread Timing for Linun Environment (CAlTLE) is presented. This approach provides a cycle-accurate timer with functions to filter out tracing overhead by coordinating efforts from both kernel and user applications. In this scheme, the kernel keeps track of accurate thread timing, while applications inform the kernel which part of the execution is to be measured. To demonstrate the tool$functionality, two case studies are provided, which include measuring latencies incurred by malloc calls and monitoring potential memory heap contention in multithreaded-multiprocessor environments.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130685847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel simulation of multiprocessor execution: implementation and results for simplescalar","authors":"N. Manjikian","doi":"10.1109/ISPASS.2001.990691","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990691","url":null,"abstract":"In research that relies on simulation in order to predict and compare the performance of proposed computing architectures, multiprocessor simulations have inherent concurrency that can be exploited for parallelization in order to reduce the execution time for a simulation. This paper describes the initial experiences in first introducing multiprocessor simulation support for the detailed out-of-order target simulatorfiom the popular Simplescalar tool set, and then parallelizing the resulting simulator for execution on a multiprocessor host system. The extended simulator provides the basis for further detailed modeling of target systems with multiple out-of-order processors through parallel simulation on a multiprocessor host. For experiments conducted on a Sun Enterprise 3500 platform, the measured speedup for the initial version of the parallelized simulator reached 4.4 on 6processors for a selected application from the SPLASH-2 benchmark.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134007564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher T. Weaver, K. Barr, Eric Marsman, Dan Ernst, T. Austin
{"title":"Performance analysis using pipeline visualization","authors":"Christopher T. Weaver, K. Barr, Eric Marsman, Dan Ernst, T. Austin","doi":"10.1109/ISPASS.2001.990670","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990670","url":null,"abstract":"High-end microprocessors are increasing in complexity to push the limits of speed and performance. As a result, analyzing these complex system can be an arduous task. Architectural simulators, acting as sofrware processors, are able to run programs and give statistics about the performance of the code on the design. While these statistics are valuable for identifying problems, they often do not provide the fidelity necessary to diagnose the cause of sluggish performance. This paper presents a cross-platform tool that can be used to visualize the flow of instructions through an architectural processor pipeline model. The Graphical Pipeline Viewel; GPC: uses a colorized pipeline trace display to deliver an efJicient diagnostic and analysis environment. The resource view of the tool, which can display cycle statistics, aids in distinguishing possible bottlenecks and architectural trade-ogs. As such, the tool is able to suggest code and architectural modifications to increase program performance.’","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132103511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Workload characterization of multithreaded java servers","authors":"Yue Luo, L. John","doi":"10.1109/ISPASS.2001.990688","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990688","url":null,"abstract":"Java has gained popularity in the commercial server arena, but the characteristics of Java server applications are not well understood. In this research, we characterize the behavior of two Java server benchmarks, VoIanoMark and SPECjbb2000, on a Pentium 111 system with the latest Java Hotspot Server VM. We compare Java server applications with SPECint2000 and also investigate the impact of multithreading by increasing the number of clients. Java servers are seen to exhibit poor instruction access behavior, including high instruction miss rate, high ITLB miss rate, high BTB miss rate and, as a result, high I-stream stalls. With increasing number of threads, the instruction behavior improves, suggesting increased locality .of access. But the resource stalls increase and eventually dwarf the diminishing I-stream stalls. With more clients, the instruction count per unit work increases and becomes a hindrance to the scalability of the servers.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114463757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Balancing thoughput and fairness in SMT processors","authors":"Kun Luo, J. Gummaraju, M. Franklin","doi":"10.1109/ISPASS.2001.990695","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990695","url":null,"abstract":"Simultaneous Multithreading (SMT) is an execution model that executes multiple threads in parallel within a single processor pipeline. Usually, an SMT processor uses shared instruction queues to collect instructions from the different threads. Hence, an SMT processor’s performance depends on how the instruction fetch unit fills these instruction queues every cycle. In the recent past, many schemes have been proposed for fetching instructions into the SMT pipeline. These schemes focussed on increasing the throughput by using the number of instructions and the number of low confidence branch predictions currently in the pipeline, to decide which threads to fetch from. The goal of this paper is to investigate fetch policies that find a balance between fairness and throughput. We present metrics to quantify fairness. We then discuss techniques to use a set of pipeline system variables for achieving balanced throughput and fairness. Finally, we evaluate several fetch policies. Our evaluation confirms that many of our fetch policies provide a good balance between throughput and fairness.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126447197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"How to compare the performance of two SMT microarchitectures","authors":"Yiannakis Sazeides, Toni Juan","doi":"10.1109/ISPASS.2001.990697","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990697","url":null,"abstract":"In this paper we discuss methods and metrics for comparing the performance of two simultaneous multithreading microarchitectures. We identify conditions under which the instructions-per-cycle metric may be misleading for comparing two simultaneous multithreading microarchitectures for the same amount of work. Part of the problem is isolated to the definition of what is same work. When simulating a mix of independent programs under the same initial conditions on two different simultaneous multithreading microarchitectures there are two approaches to ensure the work of the two runs is same: constant-work-per-thread or variablework-per-thread. For both approaches the total number of instructions in the run is constant, however, for the first, the instructions from each thread is also constant, whereas for the second is not. We claim that: (a) when simulating two microarchitectures with the constant-work-per-thread approach, the instructions-percycle is sufficient to compare them to establish the microarchitecture with the best performance, (b) when variable-work-per-thread approach is used the instruction-per-cycle may be inadequate for comparing performance. We attribute this to the inability of the instructions-per-cycle metric to account for differences in the load-balance of the two runs. A new performance metric,SMT-speedup, is proposed that enables accurate comparison of the performance of two simultaneous multithreading microarchitectures for runs with different load-balance. The new metric considers the loadbalance in terms of the size and performance of each thread. In light of the insight gain in this paper we contend that a simultaneous multithreading microarchitecture may need to trade-off throughput and load-balance to achieve the best performance.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"s3-38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130160427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation and evaluation of a best-effort scheduling algorithm in an embedded real-time system","authors":"Peng Li, B. Ravindran, T. Hegazy","doi":"10.1109/ISPASS.2001.990671","DOIUrl":"https://doi.org/10.1109/ISPASS.2001.990671","url":null,"abstract":"This paper describes an implementation and the performance evaluation of the DASAATD best-effort scheduling algorithm [4] in the pC1id/pCsinunm micro-controller system Experimental results under synthetic wrkload show that in some cases, the DASALND scheduler outperfom both the EDF (Earliest Deadline First) and the RMS (Rate Monotonic Scheduling) schedulers [7]. Meanwhile, the system performance gracefully degrades as the aggregate CPU Load increases. However, the scheduling overhead in general, is not negligible, which may lead to poorer performance than non best-effort scheduling algorithms. It is found that the schealuling overhead strongly depends on the task set properties. Using the Regression Analysis technique, we developed a statistical model accounting for the scheduling overhead We show that this model, combined with a simulation tool can well predict the system performance.","PeriodicalId":104148,"journal":{"name":"2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS.","volume":"145 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132137962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}