2011 IEEE International Symposium on Workload Characterization (IISWC)最新文献

Performance characterization of the NAS Parallel Benchmarks in OpenCL OpenCL中NAS并行基准的性能表征

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114174

Sangmin Seo, Gangwon Jo, Jaejin Lee

{"title":"Performance characterization of the NAS Parallel Benchmarks in OpenCL","authors":"Sangmin Seo, Gangwon Jo, Jaejin Lee","doi":"10.1109/IISWC.2011.6114174","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114174","url":null,"abstract":"Heterogeneous parallel computing platforms, which are composed of different processors (e.g., CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing domains. With this trend, parallel programming models need to achieve portability across different processors as well as high performance with reasonable programming effort. OpenCL (Open Computing Language) is an open standard and emerging parallel programming model to write parallel applications for such heterogeneous platforms. In this paper, we characterize the performance of an OpenCL implementation of the NAS Parallel Benchmark suite (NPB) on a heterogeneous parallel platform that consists of general-purpose CPUs and a GPU. We believe that understanding the performance characteristics of conventional workloads, such as the NPB, with an emerging programming model (i.e., OpenCL) is important for developers and researchers to adopt the programming model. We also compare the performance of the NPB in OpenCL to that of the OpenMP version. We describe the process of implementing the NPB in OpenCL and optimizations applied in our implementation. Experimental results and analysis show that the OpenCL version has different characteristics from the OpenMP version on multicore CPUs and exhibits different performance characteristics depending on different OpenCL compute devices. The results also indicate that the application needs to be rewritten or re-optimized for better performance on a different compute device although OpenCL provides source-code portability.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115497450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 193

Analyzing the effects of compiler optimizations on application reliability 分析编译器优化对应用程序可靠性的影响

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114178

M. Demertzi, M. Annavaram, Mary W. Hall

{"title":"Analyzing the effects of compiler optimizations on application reliability","authors":"M. Demertzi, M. Annavaram, Mary W. Hall","doi":"10.1109/IISWC.2011.6114178","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114178","url":null,"abstract":"As transistor sizes decrease, transient faults are becoming a significant concern for processor designers. A rich body of research has focused on ways to estimate the vulnerability of systems to transient errors and on techniques to reduce their sensitivity to soft errors. In this research, we analyze how compiler optimizations impact the expected number of failures during the execution of an application. Typically, optimizations have two effects. First, they increase structures occupancies by allowing more instructions in flight, which in turn increases their susceptibility to soft errors. Additionally, they decrease execution time, decreasing the time during which the application is exposed to transient errors. In particular, we focus on how optimizations impact occupancies in three processor structures, namely the Reorder Buffer, the Instruction Fetch Queue and the Load Store Queue. We explain the interplay between compiler and reliability by studying the changes in the code made by the compiler and the resulting responses at the microarchitectural level. Results from this research allow us to make decisions to keep an application within its performance goals and its vulnerability during its runtime within a well defined FIT target.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114409615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Efficient software-based online phase classification 基于软件的高效在线相位分类

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114207

Andreas Sembrant, David Eklov, Erik Hagersten

引用次数: 46

Analyzing soft-error vulnerability on GPGPU microarchitecture GPGPU微架构软错误漏洞分析

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114182

Jingweijia Tan, Nilanjan Goswami, Tao Li, Xin Fu

{"title":"Analyzing soft-error vulnerability on GPGPU microarchitecture","authors":"Jingweijia Tan, Nilanjan Goswami, Tao Li, Xin Fu","doi":"10.1109/IISWC.2011.6114182","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114182","url":null,"abstract":"The general-purpose computation on graphic processing units (GPGPU) becomes increasingly popular due to their high computational throughput for data parallel applications. Modern GPU architectures have limited capability for error detection and tolerance since they are originally designed for graphics processing. However, the rigorous execution correctness is required for general-purpose applications. This makes reliability a growing concern in GPGPU architecture design. With CMOS processing technologies continuously scaling down to the nano-scale, on-chip soft error rate (SER) has been predicted to increase exponentially. GPGPUs with hundreds of cores integrated in a single chip are prone to manifest high SER. This paper explores a first step to characterize GPGPU reliability in light of soft errors. We develop GPGPU-SODA (GPGPU Software Dependability Analysis), a framework to estimate the soft-error vulnerability of GPGPU microarchitecture. By using GPGPU-SODA, we observe that several microarchitecture structures in GPGPUs exhibit high soft-error susceptibility, and the structure vulnerability is sensitive to workload characteristics (e.g. branch divergences, memory coalescing). We further investigate several architectural optimizations. We find that both dynamic warp formation and increasing the number of threads supported by GPU largely affect the GPGPU soft-error robustness. However, changing the warp scheduling policy has minor impact on the structure vulnerability. The observations made in this study provide designers the useful guidance to build resilient GPGPUs: a comprehensive resiliency solution for GPGPUs should consider the entire GPGPU design instead of just focusing on a particular structure.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126648114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 84

Architecture comparisons between Nvidia and ATI GPUs: Computation parallelism and data communications Nvidia和ATI gpu的架构比较:计算并行性和数据通信

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114180

Ying Zhang, Lu Peng, Bin Li, J. Peir, Jianmin Chen

引用次数: 31

Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads 使用循环堆栈来理解多线程工作负载中的伸缩瓶颈

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114195

W. Heirman, Trevor E. Carlson, Shuai Che, K. Skadron, L. Eeckhout

{"title":"Using cycle stacks to understand scaling bottlenecks in multi-threaded workloads","authors":"W. Heirman, Trevor E. Carlson, Shuai Che, K. Skadron, L. Eeckhout","doi":"10.1109/IISWC.2011.6114195","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114195","url":null,"abstract":"This paper proposes a methodology for analyzing parallel performance by building cycle stacks. A cycle stack quantifies where the cycles have gone, and provides hints towards optimization opportunities. We make the case that this is particularly interesting for analyzing parallel performance: understanding how cycle components scale with increasing core counts and/or input data set sizes leads to insight with respect to scaling bottlenecks due to synchronization, load imbalance, poor memory performance, etc. We present several case studies illustrating the use of cycle stacks. As a subsequent step, we further extend the methodology to analyze sets of parallel workloads using statistical data analysis, and perform a workload characterization to understand behavioral differences across benchmark suites. We analyze the SPLASH-2, PARSEC and Rodinia benchmark suites and conclude that the three benchmark suites cover similar areas in the workload space. However, scaling behavior of these benchmarks towards larger input sets and/or higher core counts is highly dependent on the benchmark, the way in which the inputs have been scaled, and on the machine configuration.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133925572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

MEVBench: A mobile computer vision benchmarking suite MEVBench:一个移动计算机视觉基准测试套件

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114206

Jason Clemons, Haishan Zhu, S. Savarese, T. Austin

{"title":"MEVBench: A mobile computer vision benchmarking suite","authors":"Jason Clemons, Haishan Zhu, S. Savarese, T. Austin","doi":"10.1109/IISWC.2011.6114206","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114206","url":null,"abstract":"The growth in mobile vision applications, coupled with the performance limitations of mobile platforms, has led to a growing need to understand computer vision applications. Computationally intensive mobile vision applications, such as augmented reality or object recognition, place significant performance and power demands on existing embedded platforms, often leading to degraded application quality. With a better understanding of this growing application space, it will be possible to more effectively optimize future embedded platforms. In this work, we introduce and evaluate a custom benchmark suite for mobile embedded vision applications named MEVBench. MEVBench provides a wide range of mobile vision applications such as face detection, feature classification, object tracking and feature extraction. To better understand mobile vision processing characteristics at the architectural level, we analyze single and multithread implementations of many algorithms to evaluate performance, scalability, and memory characteristics. We provide insights into the major areas where architecture can improve the performance of these applications in embedded systems.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"346 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134057910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

Thread reinforcer: Dynamically determining number of threads via OS level monitoring 线程强化器:通过操作系统级别监控动态确定线程数

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114208

K. Pusukuri, Rajiv Gupta, L. Bhuyan

{"title":"Thread reinforcer: Dynamically determining number of threads via OS level monitoring","authors":"K. Pusukuri, Rajiv Gupta, L. Bhuyan","doi":"10.1109/IISWC.2011.6114208","DOIUrl":"https://doi.org/10.1109/IISWC.2011.6114208","url":null,"abstract":"It is often assumed that to maximize the performance of a multithreaded application, the number of threads created should equal the number of cores. While this may be true for systems with four or eight cores, this is not true for systems with larger number of cores. Our experiments with PARSEC programs on a 24-core machine demonstrate this. Therefore, dynamically determining the appropriate number of threads for a multithreaded application is an important unsolved problem. In this paper we develop a simple technique for dynamically determining appropriate number of threads without recompiling the application or using complex compilation techniques or modifying Operating System policies. We first present a scalability study of eight programs from PARSEC conducted on a 24 core Dell PowerEdge R905 server running OpenSolaris.2009.06 for numbers of threads ranging from a few threads to 128 threads. Our study shows that not only does the maximum speedup achieved by these programs vary widely (from 3.6x to 21.9x), the number of threads that produce maximum speedups also vary widely (from 16 to 63 threads). By understanding the overall speedup behavior of these programs we identify the critical Operating System level factors that explain why the speedups vary with the number of threads. As an application of these observations, we develop a framework called “Thread Reinforcer” that dynamically monitors program's execution to search for the number of threads that are likely to yield best speedups. Thread Reinforcer identifies optimal or near optimal number of threads for most of the PARSEC programs studied and as well as for SPEC OMP and PBZIP2 programs.","PeriodicalId":367515,"journal":{"name":"2011 IEEE International Symposium on Workload Characterization (IISWC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115547644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 105

Modeling and predicting application performance on hardware accelerators 在硬件加速器上建模和预测应用程序性能

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114198

Mitesh R. Meswani, L. Carrington, D. Unat, A. Snavely, S. Baden, S. Poole

引用次数: 10

A tool for characterizing and succinctly representing the data access patterns of applications 描述和简洁地表示应用程序的数据访问模式的工具

2011 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2011-11-06 DOI: 10.1109/IISWC.2011.6114173

C. Olschanowsky, A. Snavely, L. Carrington

引用次数: 4