{"title":"Two-level soft error vulnerability prediction on SMT/CMP architectures","authors":"Lide Duan, Lu Peng, Bin Li","doi":"10.1109/IISWC.2011.6114203","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114203","url":null,"abstract":"Architectural Vulnerability Factor (AVF) [3] quantifies the probability that a raw soft error finally produces a visible error in the program output. It is often used by computer designers as an important reliability metric at the architectural level. However, the AVF measurement is extremely expensive in terms of hardware and computation. In this paper, we characterize and predict a program's AVF under resource contention and sharing with other programs running on Simultaneous Multithreading (SMT) and Chip-Multiprocessor (CMP) architectures.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"47 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133266971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Suzumura, Koji Ueno, Hitoshi Sato, K. Fujisawa, S. Matsuoka
{"title":"Performance characteristics of Graph500 on large-scale distributed environment","authors":"T. Suzumura, Koji Ueno, Hitoshi Sato, K. Fujisawa, S. Matsuoka","doi":"10.1109/IISWC.2011.6114175","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114175","url":null,"abstract":"Graph500 is a new benchmark for supercomputers based on large-scale graph analysis, which is becoming an important form of analysis in many real-world applications. Graph algorithms run well on supercomputers with shared memory. For the Linpack-based supercomputer rankings, TOP500 reports that heterogeneous and distributed-memory super-computers with large numbers of GPGPUs are becoming dominant. However, the performance characteristics of large-scale graph analysis benchmarks such as Graph500 on distributed-memory supercomputers have so far received little study. This is the first report of a performance evaluation and analysis for Graph500 on a commodity-processor-based distributed-memory supercomputer. We found that the reference implementation “replicated-csr” based on distributed level-synchronized breadth-first search solves a large free graph problem with 231 vertices and 235 edges (approximately 2.15 billon vertices and 34.3 billion edges) in 3.09 seconds with 128 nodes and 3,072 cores. This equates to 11 giga-edges traversed per second. We describe the algorithms and implementations of the reference implementations of Graph500, and analyze the performance characteristics with varying graph sizes and numbers of computer nodes and different implementations. Our results will also contribute to the development of optimized algorithms for the coming exascale machines.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127659069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalability analysis of enterprise javaworkloads on a multi-core system","authors":"X. Guerin, Yanbin Liu, Parijat Dube, Seetharami R. Seelam, Pierre-Andre Paumelle","doi":"10.1109/IISWC.2011.6114202","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114202","url":null,"abstract":"This paper shows that not only lock contentions, but also adherence issues between two layers of a software stack impact scalability of Java enterprise workloads.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129822159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Beau Piccart, A. Georges, H. Blockeel, L. Eeckhout
{"title":"Ranking commercial machines through data transposition","authors":"Beau Piccart, A. Georges, H. Blockeel, L. Eeckhout","doi":"10.1109/IISWC.2011.6114192","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114192","url":null,"abstract":"The performance numbers reported by benchmarking consortia and corporations provide little or no insight into the performance of applications of interest that are not part of the benchmark suite. This paper describes data transposition, a novel methodology for addressing this ubiquitous benchmarking problem. Data transposition predicts the performance for an application of interest on a target machine based on its performance similarities with the industry-standard benchmarks on a limited number of predictive machines. The key idea of data transposition is to exploit machine similarity rather than workload similarity as done in prior work, i.e., data transposition identifies a predictive machine that is most similar to the target machine of interest for predicting performance for the application of interest. We demonstrate the accuracy and effectiveness of data transposition using the SPEC CPU2006 benchmarks and a set of 117 commercial machines. We report that the machine ranking obtained through data transposition correlates well with the machine ranking obtained using measured performance numbers (average correlation coefficient of 0.93). Not only does data transposition improve average correlation, we also demonstrate that data transposition is more robust towards outlier benchmarks, i.e., the worst-case correlation coefficient improves from 0.59 by prior art to 0.71. More concretely, using data transposition to predict the top-1 machine for an application of interest leads to the best performing machine for most workloads (average deficiency of 1.2% and max deficiency of 24.8% for one benchmark), whereas prior work leads to deficiencies over 100% for some workloads.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126879000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A performance study on operator-based stream processing systems","authors":"Miyuru Dayarathna, Souhei Takeno, T. Suzumura","doi":"10.1109/IISWC.2011.6114204","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114204","url":null,"abstract":"This short paper compares and contrasts performance characteristics of System S and S4, two stream processing systems which use operator-based programming model. Our aim is to investigate and characterize which architecture is better for handling which type of stream processing workloads and observe the reasons for such characteristics.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132978358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huafeng Xi, Jianfeng Zhan, Zhen Jia, Xuehai Hong, Lei Wang, Lixin Zhang, Ninghui Sun, Gang Lu
{"title":"Characterization of real workloads of web search engines","authors":"Huafeng Xi, Jianfeng Zhan, Zhen Jia, Xuehai Hong, Lei Wang, Lixin Zhang, Ninghui Sun, Gang Lu","doi":"10.1109/IISWC.2011.6114193","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114193","url":null,"abstract":"Search is the most heavily used web application in the world and is still growing at an extraordinary rate. Understanding the behaviors of web search engines, therefore, is becoming increasingly important to the design and deployment of data center systems hosting search engines. In this paper, we study three search query traces collected from real world web search engines in three different search service providers. The first part of our study is to uncover the patterns hidden in the query traces by analyzing the variations, frequencies, and locality of query requests. Our analysis reveals that, contradicted to some previous studies, real-world query traces do not follow well-defined probability models, such as Poisson distribution and log-normal distribution. The second part of our study is to deploy the real query traces and three synthetic traces generated using probability models proposed by other researchers on a Nutch based search engine. The measured performance data from the deployments further confirm that synthetic traces do not accurately reflect the real traces. We develop an evaluation tool that can collect performance metrics on-line with negligible overhead. The performance metrics include average response time, CPU utilization, Disk accesses, and cycles-per-instructions, etc. The third of our study is to compare the search engine with representative benchmarks, namely Gridmix, SPECweb2005, TPC-C, SPECCPU2006, and HPCC, with respect to basic architecture-level characteristics and performance metrics, such as instruction mix, processor pipeline stall breakdown, memory access latency, and disk accesses. The experimental results show that web search engines have a high percentage of load/store instructions, but have good cache/memory performance. We hope those results presented in this paper will enable system designers to gain insights on optimizing systems hosting search engines.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115389491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Program Interferometry","authors":"Zhe Wang, Daniel A. Jiménez","doi":"10.1109/IISWC.2011.6114177","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114177","url":null,"abstract":"Modern microprocessors have many microarchitectural features. Quantifying the performance impact of one feature such as dynamic branch prediction can be difficult. On one hand, a timing simulator can predict the difference in performance given two different implementations of the technique, but simulators can be quite inaccurate. On the other hand, real systems are very accurate representations of themselves, but often cannot be modified to study the impact of a new technique.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116983506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}