2013 IEEE International Symposium on Workload Characterization (IISWC)最新文献_第2页

Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies 描述多线程应用程序的特征，以设计共享感知的最后一级缓存替换策略

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704665

R. Natarajan, Mainak Chaudhuri

{"title":"Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies","authors":"R. Natarajan, Mainak Chaudhuri","doi":"10.1109/IISWC.2013.6704665","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704665","url":null,"abstract":"Recent years have seen a large volume of proposals on managing the shared last-level cache (LLC) of chip-multiprocessors (CMPs). However, most of these proposals primarily focus on reducing the amount of destructive interference between competing independent threads of multi-programmed workloads. While very few of these studies evaluate the proposed policies on shared memory multi-threaded applications, they do not improve constructive cross-thread sharing of data in the LLC In this paper, we characterize a set of multi-threaded applications drawn from the PARSEC, SPEC OMP, and SPLASH-2 suites with the goal of introducing sharing-awareness in LLC replacement policies. We motivate our characterization study by quantifying the potential contributions of the shared and the private blocks toward the overall volume of the LLC hits in these applications and show that the shared blocks are more important than the private blocks. Next, we characterize the amount of sharing-awareness enjoyed by recent proposals compared to the optimal policy. We design and evaluate a generic oracle that can be used in conjunction with any existing policy to quantify the potential improvement that can come from introducing sharing-awareness. The oracle analysis shows that introducing sharing-awareness reduces the number of LLC misses incurred by the least-recently-used (LRU) policy by 6% and 10% on average for a 4MB and 8MB LLC respectively. A realistic implementation of this oracle requires the LLC controller to have the capability to accurately predict, at the time a block is filled into the LLC, whether the block will be shared during its residency in the LLC. We explore the feasibility of designing such a predictor based on the address of the fill and the program counter of the instruction that triggers the fill. Our sharing behavior predictability study of two history-based fill-time predictors that use block addresses and program counters concludes that achieving acceptable levels of accuracy with such predictors will require other architectural and/or high-level program semantic features that have strong correlations with active sharing phases of the LLC blocks.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126824996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

ACE: Abstracting, characterizing and exploiting datacenter power demands ACE:抽象、表征和开发数据中心的电力需求

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704669

Di Wang, Chuangang Ren, Sriram Govindan, A. Sivasubramaniam, B. Urgaonkar, A. Kansal, Kushagra Vaid

{"title":"ACE: Abstracting, characterizing and exploiting datacenter power demands","authors":"Di Wang, Chuangang Ren, Sriram Govindan, A. Sivasubramaniam, B. Urgaonkar, A. Kansal, Kushagra Vaid","doi":"10.1109/IISWC.2013.6704669","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704669","url":null,"abstract":"Peak power management of datacenters has tremendous cost implications. While numerous mechanisms have been proposed to cap power consumption, real datacenter power consumption data is scarce. Prior studies have either used a small set of applications and/or servers, or presented data that is at an aggregate scale from which it is difficult to design and evaluate new and existing optimizations. To address this gap, we collect power measurement data at multiple spatial and fine-grained temporal resolutions from several geo-distributed datacenters of Microsoft corporation over 6 months. We conduct aggregate analysis of this data to study its statistical properties. We find evidence of self-similarity in power demands, statistical multiplexing effects, and correlations with the cooling power that caters to the IT equipment. With workload characterization a key ingredient for systems design and evaluation, we note the importance of better abstractions for capturing power demands, in the form of peaks and valleys. We identify attributes for peaks and valleys, and important correlations across these attributes that can influence the choice and effectiveness of different power capping techniques. We characterize these attributes and their correlations, showing the burstiness of small duration peaks, and the importance of not ignoring the rare but more stringent or long peaks. The correlations between peaks and valleys suggest the need for techniques to aggregate and collectively handle them. With the wide scope of exploitability of such characteristics for power provisioning and optimizations, we illustrate its benefits with two specific case studies. The first shows how peaks can be differentially handled based on our peak and valley characterization using existing approaches, rather than a one-size-fits-all solution. The second illustrates a simple capacity provisioning strategy for energy storage using the peak and valley characteristics.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131032263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Platform-independent analysis of function-level communication in workloads 工作负载中功能级通信的平台独立分析

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704685

Siddharth Nilakantan, Mark Hempstead

引用次数: 7

iBench: Quantifying interference for datacenter applications iBench:量化数据中心应用程序的干扰

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704667

Christina Delimitrou, C. Kozyrakis

{"title":"iBench: Quantifying interference for datacenter applications","authors":"Christina Delimitrou, C. Kozyrakis","doi":"10.1109/IISWC.2013.6704667","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704667","url":null,"abstract":"Interference between co-scheduled applications is one of the major reasons that causes modern datacenters (DCs) to operate at low utilization. DC operators traditionally side-step interference either by disallowing colocation altogether and providing isolated server instances, or by requiring the users to express resource reservations, which are often exaggerated to counter-balance the unpredictability in the quality of allocated resources. Understanding, reducing and managing interference can significantly impact the manner in which these large-scale systems operate. We present iBench, a novel workload suite that helps quantify the pressure different applications put in various shared resources, and similarly the pressure they can tolerate in these resources. iBench consists of a set of carefully-crafted benchmarks that induce interference of increasing intensity in resources that span the CPU, cache hierarchy, memory, storage and networking subsystems. We first validate the effect that iBench workloads have on performance against a wide spectrum of DC applications. Then, we use iBench to demonstrate the importance of considering interference in a set of challenging problems that range from DC scheduling and server provisioning, to resource-efficient application development and scheduling for heterogeneous CMPs. In all cases quantifying interference with iBench results in significant performance and/or efficiency improvements. We plan to release iBench under a free software license.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128595921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 106

Quantifying the energy cost of data movement in scientific applications 量化科学应用中数据移动的能量成本

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704670

Gokcen Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie

引用次数: 110

On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs 多核SIMD cpu和支持cuda的gpu的性能和能效

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704683

Ronald Duarte, Resit Sendag, F. J. Vetter

{"title":"On the performance and energy-efficiency of multi-core SIMD CPUs and CUDA-enabled GPUs","authors":"Ronald Duarte, Resit Sendag, F. J. Vetter","doi":"10.1109/IISWC.2013.6704683","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704683","url":null,"abstract":"This paper explores the performance and energy efficiency of CUDA-enabled GPUs and multi-core SIMD CPUs using a set of kernels and full applications. Our implementations efficiently exploit both SIMD and thread-level parallelism on multi-core CPUs and the computational capabilities of CUDA-enabled GPUs. We discuss general optimization techniques for our CPU-only and CPU-GPU platforms. To fairly study performance and energy-efficiency, we also used two applications which utilize several kernels. Finally, we present an evaluation of the implementation effort required to efficiently utilize multi-core SIMD CPUs and CUDA-enabled GPUs for the benchmarks studied. Our results show that kernel-only performance and energy-efficiency could be misleading when evaluating parallel hardware; therefore, true results must be obtained using full applications. We show that, after all respective optimizations have been made, the best performing and energy-efficient platform varies for different benchmarks. Finally, our results show that PPEH (Performance gain Per Effort Hours), our newly introduced metric, can affectively be used to quantify efficiency of implementation effort across different benchmarks and platforms.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126282703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

WiBench: An open source kernel suite for benchmarking wireless systems WiBench:用于对无线系统进行基准测试的开源内核套件

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704678

Qi Zheng, Yajing Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, T. Mudge

{"title":"WiBench: An open source kernel suite for benchmarking wireless systems","authors":"Qi Zheng, Yajing Chen, R. Dreslinski, C. Chakrabarti, A. Anastasopoulos, S. Mahlke, T. Mudge","doi":"10.1109/IISWC.2013.6704678","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704678","url":null,"abstract":"The rapid growth in the number of mobile devices and the higher data rate requirements of mobile subscribers have made wireless signal processing a key driving application of mobile computing technology. To design better mobile platforms and the supporting wireless infrastructure, it is very important for computer architects and system designers to understand and characterize the performance of existing and upcoming wireless protocols. In this paper, we present a newly developed open-source benchmark suite called WiBench. It consists of a wide range of signal processing kernels used in many mainstream standards such as 802.11, WCDMA and LTE. The kernels include FFT/IFFT, MIMO, channel estimation, channel coding, constellation mapping, etc. Each kernel is a self-contained configurable block which can be tuned to meet the different system requirements. Several standard channel models have also been included to study system performance, such as the bit error rate. The suite also contains an LTE uplink system as a representative example of a wireless system that can be built using these kernels. WiBench is provided in C++ to make it easier for computer architects to profile and analyze the system. We characterize the performance of WiBench to illustrate how it can be used to guide hardware system design. Architectural analyses on each individual kernel and on the entire LTE uplink are performed, indicating the hotspots, available parallelism, and runtime performance. Finally, a MATLAB version is also included for debugging purposes.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123422532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Performance implications of System Management Mode 系统管理模式对性能的影响

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704682

Brian Delgado, K. Karavanic

引用次数: 21

Power and performance of GPU-accelerated systems: A closer look gpu加速系统的功率和性能:近距离观察

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-09-01 DOI: 10.1109/IISWC.2013.6704675

Yukitaka Abe, Hiroshi Sasaki, S. Kato, Koji Inoue, M. Edahiro, M. Peres

引用次数: 7

Characterizing data analysis workloads in data centers 描述数据中心的数据分析工作负载

2013 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2013-07-30 DOI: 10.1109/IISWC.2013.6704671

Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo

{"title":"Characterizing data analysis workloads in data centers","authors":"Zhen Jia, Lei Wang, Jianfeng Zhan, Lixin Zhang, Chunjie Luo","doi":"10.1109/IISWC.2013.6704671","DOIUrl":"https://doi.org/10.1109/IISWC.2013.6704671","url":null,"abstract":"As the amount of data explodes rapidly, more and more corporations are using data centers to make effective decisions and gain a competitive edge. Data analysis applications play a significant role in data centers, and hence it has became increasingly important to understand their behaviors in order to further improve the performance of data center computer systems. In this paper, after investigating three most important application domains in terms of page views and daily visitors, we choose eleven representative data analysis workloads and characterize their micro-architectural characteristics by using hardware performance counters, in order to understand the impacts and implications of data analysis workloads on the systems equipped with modern superscalar out-of-order processors. Our study on the workloads reveals that data analysis applications share many inherent characteristics, which place them in a different class from desktop (SPEC CPU2006), HPC (HPCC), and service workloads, including traditional server workloads (SPECweb200S) and scale-out service workloads (four among six benchmarks in CloudSuite), and accordingly we give several recommendations for architecture and system optimizations. On the basis of our workload characterization work, we released a benchmark suite named DCBench for typical datacenter workloads, including data analysis and service workloads, with an open-source license on our project home page on http://prof.ict.ac.cnIDCBench. We hope that DCBench is helpful for performing architecture and small-to-medium scale system researches for datacenter computing.","PeriodicalId":365868,"journal":{"name":"2013 IEEE International Symposium on Workload Characterization (IISWC)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124677160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 124