2017 IEEE International Symposium on Workload Characterization (IISWC)最新文献_第2页

Evaluating energy storage for a multitude of uses in the datacenter 评估能源存储在数据中心的多种用途

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167752

Iyswarya Narayanan, Di Wang, A. Mamun, A. Sivasubramaniam, H. Fathy, Sean James

{"title":"Evaluating energy storage for a multitude of uses in the datacenter","authors":"Iyswarya Narayanan, Di Wang, A. Mamun, A. Sivasubramaniam, H. Fathy, Sean James","doi":"10.1109/IISWC.2017.8167752","DOIUrl":"https://doi.org/10.1109/IISWC.2017.8167752","url":null,"abstract":"Datacenters often are a power utility's largest consumers, and are expected to participate in several power management scenarios with diverse characteristics in which Energy Storage Devices (ESDs) are expected to play important roles. Different ESD technologies exist, including little explored technologies such as flow batteries, that offer different performance characteristics in cost, size, and environmental impact. While prior works in datacenter ESD literature have considered one of usage aspect, technology, performance metric (typically cost), the whole three-dimensional space is little explored. Towards understanding this design space, this paper presents first such study towards joint characterization of ESD usages based on their provisioning and operating demands, under ideal and realistic ESD technologies, and quantify their impact on datacenter performance. We expect our work can help datacenter operators to characterize this three-dimensional space in a systematic manner, and make design decisions targeted towards cost-effective and environmental impact aware datacenter energy management.","PeriodicalId":110094,"journal":{"name":"2017 IEEE International Symposium on Workload Characterization (IISWC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126058798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Determining work partitioning on closely coupled heterogeneous computing systems using statistical design of experiments 用实验统计设计确定紧密耦合异构计算系统的工作划分

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167766

Yectli A. Huerta, Brent A. Swartz, D. Lilja

引用次数: 4

Congestion-aware memory management on NUMA platforms: A VMware ESXi case study NUMA平台上的拥塞感知内存管理:一个VMware ESXi案例研究

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167772

Jagadish B. Kotra, Seongbeom Kim, Kamesh Madduri, M. Kandemir

引用次数: 6

Work as a team or individual: Characterizing the system-level impacts of main memory partitioning 作为团队或个人工作:描述主内存分区的系统级影响

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167773

Eojin Lee, Jongwook Chung, Daejin Jung, Sukhan Lee, Sheng Li, Jung Ho Ahn

{"title":"Work as a team or individual: Characterizing the system-level impacts of main memory partitioning","authors":"Eojin Lee, Jongwook Chung, Daejin Jung, Sukhan Lee, Sheng Li, Jung Ho Ahn","doi":"10.1109/IISWC.2017.8167773","DOIUrl":"https://doi.org/10.1109/IISWC.2017.8167773","url":null,"abstract":"Modern multi-core systems employ shared memory architecture, entailing problems related to the main memory such as row-buffer conflicts, time-varying hot-spots across memory channels, and superfluous switches between reads and writes originating from different cores. There have been proposals to solve these problems by partitioning main memory across banks and/or channels such that a DRAM bank is dedicated to a single core, being free from inter-thread row-buffer conflicts. However, those studies either focused on only multi-programmed workloads on which cores operate independently, not cooperatively, or specific hardware configurations with a limited number of degrees of freedom in the number of main memory banks, ranks, and channels. We analyze the influence of memory partitioning on systems with various degrees of banks, ranks, and channels using multi-threaded and multi-programmed workloads, making the following key observations. Bank partitioning is beneficial when memory-intensive applications in a multi-programmed workload have similar characteristics in bank-level parallelism, bandwidth, and capacity demands. Any diversity in these demands with a limited memory capacity greatly diminishes the bank partitioning benefits. As memory access/usage patterns across cores are more easily manageable on multi-threaded workloads, bank partitioning is more often effective with memory intensive multithreaded applications. Channel partitioning becomes effective when the reduction of the negative impacts of time-varying hotspots across memory channels outweighs the load imbalance due to partitioning. We also demonstrate the benefits of rank partitioning with regard to minimizing read-write switches on multi-threaded applications where cores can coordinate memory accesses.","PeriodicalId":110094,"journal":{"name":"2017 IEEE International Symposium on Workload Characterization (IISWC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133604141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

LORE: A loop repository for the evaluation of compilers 用于编译器求值的循环存储库

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167779

Zhi Chen, Zhangxiaowen Gong, J. Szaday, D. Wong, D. Padua, A. Nicolau, A. Veidenbaum, Neftali Watkinson, Zehra Sura, Saeed Maleki, J. Torrellas, G. DeJong

{"title":"LORE: A loop repository for the evaluation of compilers","authors":"Zhi Chen, Zhangxiaowen Gong, J. Szaday, D. Wong, D. Padua, A. Nicolau, A. Veidenbaum, Neftali Watkinson, Zehra Sura, Saeed Maleki, J. Torrellas, G. DeJong","doi":"10.1109/IISWC.2017.8167779","DOIUrl":"https://doi.org/10.1109/IISWC.2017.8167779","url":null,"abstract":"Although numerous loop optimization techniques have been designed and deployed in commercial compilers in the past, virtually no common experimental infrastructure nor repository exists to help the compiler community evaluate the effectiveness of these techniques. This paper describes a repository, LORE, that maintains a large number of C language for loop nests extracted from popular benchmarks, libraries, and real applications. It also describes the infrastructure that builds and maintains the repository. Each loop nest in the repository has been compiled, transformed, executed, and measured independently. These loops cover a variety of properties that can be used by the compiler community to evaluate loop optimizations using a broad and representative collection of loops. To illustrate the usefulness of the repository, we also present two example applications. One is assessing the capabilities of the auto-vectorization features of three widely used compilers. The other is measuring the performance difference of a compiler across different versions. These applications prove that the repository is valuable for identifying the strengths and weaknesses of a compiler and for quantitatively measuring the evolution of a compiler.","PeriodicalId":110094,"journal":{"name":"2017 IEEE International Symposium on Workload Characterization (IISWC)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory 探讨内存块排列对交叉条ReRAM主存储器性能的影响

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167774

M. Ramezani, Nima Elyasi, M. Arjomand, M. Kandemir, A. Sivasubramaniam

{"title":"Exploring the impact of memory block permutation on performance of a crossbar ReRAM main memory","authors":"M. Ramezani, Nima Elyasi, M. Arjomand, M. Kandemir, A. Sivasubramaniam","doi":"10.1109/IISWC.2017.8167774","DOIUrl":"https://doi.org/10.1109/IISWC.2017.8167774","url":null,"abstract":"Owing to the advantages of low standby power and high scalability, ReRAM technology is considered as a promising replacement for conventional DRAM in future manycore systems. In order to make ReRAM highly scalable, the memory array has to have a crossbar array structure, which needs a specific access mechanism for activating a row of memory when reading/writing a data block from/to it. This type of memory access would cause Sneak Current that would lead to voltage drop on the memory cells of the activated row, i.e., the cells which are far from the write drivers experience more voltage drop compared to those close to them. This results in a nonuniform access latency for the cells of the same row. To address this problem, we propose and evaluate a scheme that exploits the non-uniformity of write access pattern of the workloads. More specifically, based on our extensive characterization of write patterns to the cache lines and memory pages of 20 CPU workloads, we recognized that (i) on each main memory access, just a few cache lines of the activated row need to be updated on a write-back, and more importantly, there is a temporal and spatial locality of the writes to the cache lines; and (ii) all pages of the memory footprint of an application do not see the same write counts during the execution of the workload. Motivated by these characteristics, we then evaluate different intra-page memory block permutations in order to improve the performance of a crossbar ReRAM-based main memory. Our results collectively show that, by applying some types of intra-page memory block permutation, the access latency to a ReRAM-based main memory can be reduced up to 50% when running the SPEC CPU2006 workloads.","PeriodicalId":110094,"journal":{"name":"2017 IEEE International Symposium on Workload Characterization (IISWC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121504935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

The microsoft catapult project 微软弹射器项目

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167769

Derek Chiou

引用次数: 19

Approximeter: Automatically finding and quantifying code sections for approximation 近似值:自动查找和量化近似代码段

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167765

Riad Akram, A. Muzahid

引用次数: 2

Understanding the performance-accuracy tradeoffs of floating-point arithmetic on GPUs 理解gpu上浮点运算的性能精度权衡

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167778

Sruthikesh Surineni, Ruidong Gu, Huyen Nguyen, M. Becchi

{"title":"Understanding the performance-accuracy tradeoffs of floating-point arithmetic on GPUs","authors":"Sruthikesh Surineni, Ruidong Gu, Huyen Nguyen, M. Becchi","doi":"10.1109/IISWC.2017.8167778","DOIUrl":"https://doi.org/10.1109/IISWC.2017.8167778","url":null,"abstract":"Floating-point computations produce approximate results, possibly leading to inaccuracy and reproducibility problems. Existing work addresses two issues: first, the design of high precision floating-point representations; second, the study of methods to trade off accuracy and performance of CPU applications. However, a comprehensive study of the tradeoffs between accuracy and performance on modern GPUs is missing. This study covers the use of different floating-point precisions (i.e., single and double floating-point precision in IEEE 754 standard, GNU Multiple Precision, and composite floating-point precision) on GPU using a variety of synthetic and real-world benchmark applications. First, we analyze the support for single and double precision floating-point arithmetic on different GPU architectures, and we characterize the latencies of all floating-point instructions on GPU. Second, we study the performance/accuracy tradeoffs related to the use of different arithmetic precisions on addition, multiplication, division, and natural exponential function. Third, we analyze the combined use of different arithmetic operations on three benchmark applications characterized by different instruction mixes and arithmetic intensities. As a result of this analysis, we provide insights to guide users to the selection of the arithmetic precision leading to a good performance/accuracy tradeoff depending on the arithmetic operations and mathematical functions used in their program and the degree of multithreading of the code.","PeriodicalId":110094,"journal":{"name":"2017 IEEE International Symposium on Workload Characterization (IISWC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126783781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Fine-grained energy profiling for deep convolutional neural networks on the Jetson TX1 Jetson TX1上深度卷积神经网络的细粒度能量分析

2017 IEEE International Symposium on Workload Characterization (IISWC) Pub Date : 2017-10-01 DOI: 10.1109/IISWC.2017.8167764

Crefeda Faviola Rodrigues, G. Riley, M. Luján

引用次数: 20