2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing最新文献_第3页

Compression Speed Enhancements to LZO for Multi-core Systems 多核系统LZO压缩速度增强

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.29

Jason Kane, Qing Yang

引用次数: 15

An OS-Hypervisor Infrastructure for Automated OS Crash Diagnosis and Recovery in a Virtualized Environment 虚拟化环境中用于自动诊断和恢复操作系统崩溃的OS- hypervisor基础架构

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.10

J. Jann, R. S. Burugula, Ching-Farn E. Wu, Kaoutar El Maghraoui

引用次数: 2

Energy Savings via Dead Sub-Block Prediction 通过死亡子块预测节约能源

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.30

M. Alves, Khubaib, Eiman Ebrahimi, V. Narasiman, Carlos Villavieja, P. Navaux, Y. Patt

{"title":"Energy Savings via Dead Sub-Block Prediction","authors":"M. Alves, Khubaib, Eiman Ebrahimi, V. Narasiman, Carlos Villavieja, P. Navaux, Y. Patt","doi":"10.1109/SBAC-PAD.2012.30","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2012.30","url":null,"abstract":"Cache memories have traditionally been designed to exploit spatial locality by fetching entire cache lines from memory upon a miss. However, recent studies have shown that often the number of sub-blocks within a line that are actually used is low. Furthermore, those sub-blocks that are used are accessed only a few times before becoming dead (i.e., never accessed again). This results in considerable energy waste since (1) data not needed by the processor is brought into the cache, and (2) data is kept alive in the cache longer than necessary. We propose the Dead Sub-Block Predictor (DSBP) to predict which sub-blocks of a cache line will be actually used and how many times it will be used in order to bring into the cache only those sub-blocks that are necessary, and power them off after they are touched the predicted number of times. We also use DSBP to identify dead lines (i.e., all sub-blocks off) and augment the existing replacement policy by prioritizing dead lines for eviction. Our results show a 24% energy reduction for the whole cache hierarchy when averaged over the SPEC2000, SPEC2006 and NAS-NPB benchmarks.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132264588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Scalable Thread Scheduling in Asymmetric Multicores for Power Efficiency 非对称多核中的可扩展线程调度以提高能效

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.40

Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu

{"title":"Scalable Thread Scheduling in Asymmetric Multicores for Power Efficiency","authors":"Rance Rodrigues, A. Annamalai, I. Koren, S. Kundu","doi":"10.1109/SBAC-PAD.2012.40","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2012.40","url":null,"abstract":"The emergence of asymmetric multicore processors(AMPs) has elevated the problem of thread scheduling in such systems. The computing needs of a thread often vary during its execution (phases) and hence, reassigning threads to cores(thread swapping) upon detection of such a change, can significantly improve the AMP's power efficiency. Even though identifying a change in the resource requirements of a workload is straightforward, determining the thread reassignment is a challenge. Traditional online learning schemes rely on sampling to determine the best thread to core in AMPs. However, as the number of cores in the multicore increases, the sampling overhead may be too large. In this paper, we propose a novel technique to dynamically assess the current thread to core assignment and determine whether swapping the threads between the cores will be beneficial and achieve a higher performance/Watt. This decision is based on estimating the expected performance and power of the current program phase on other cores. This estimation is done using the values of selected performance counters in the host core. By estimating the expected performance and power on each core type, informed thread scheduling decisions can be made while avoiding the overhead associated with sampling. We illustrate our approach using an 8-core high performance/low-power AMP and show the performance/Watt benefits of the proposed dynamic thread scheduling technique. We compare our proposed scheme against previously published schemes based on online learning and two schemes based on the use of an oracle, one static and the other dynamic. Our results show that significant performance/Watt gains can be achieved through informed thread scheduling decisions in AMPs.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129303976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Parallel Exact Inference on Multicore Using MapReduce 基于MapReduce的多核并行精确推理

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.43

N. Ma, Yinglong Xia, V. Prasanna

引用次数: 5

Cloud Workload Analysis with SWAT 使用SWAT进行云工作负载分析

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.13

M. Breternitz, Keith Lowery, Anton Charnoff, Patryk Kamiński, Leonardo Piga

{"title":"Cloud Workload Analysis with SWAT","authors":"M. Breternitz, Keith Lowery, Anton Charnoff, Patryk Kamiński, Leonardo Piga","doi":"10.1109/SBAC-PAD.2012.13","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2012.13","url":null,"abstract":"This note describes the Synthetic Workload Application Toolkit (SWAT) and presents the results from a set of experiments on some key cloud workloads. SWAT is a software platform that automates the creation, deployment, provisioning, execution, and (most importantly) data gathering of synthetic compute workloads on clusters of arbitrary size. SWAT collects and aggregates data from application execution logs, operating system call interfaces, and micro architecture-specific program counters. The data collected by SWAT are used to characterize the effects of network traffic, file I/O, and computation on program performance. The output is analyzed to provide insight into the design and deployment of cloud workloads and systems. Each workload is characterized according to its scalability with the number of server nodes and Hadoop server jobs, sensitivity to network characteristics (bandwidth, latency, statistics on packet size), and computation vs. I/O intensity as these values adjusted via workload-specific parameters. (In the future, we will use SWAT's benchmark synthesizer capability.) We also characterize micro-architectural characteristics that give insight on the micro architecture of processors better suited for this class of workloads. We contrast our results with prior work on Cloud Suite [5], validating some conclusions and providing further insight into others. This illustrates SWAT's data collection capabilities and usefulness to obtain insight on cloud applications and systems.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115431013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Scalable Algorithms for Distributed-Memory Adaptive Mesh Refinement 分布式内存自适应网格细化的可扩展算法

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.48

Akhil Langer, J. Lifflander, P. Miller, K. Pan, L. Kalé, P. Ricker

引用次数: 29

CSHARP: Coherence and SHaring Aware Cache Replacement Policies for Parallel Applications 并行应用的一致性和共享感知缓存替换策略

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.27

Biswabandan Panda, S. Balachandran

引用次数: 9

Integrating Dataflow Abstractions into the Shared Memory Model 将数据流抽象集成到共享内存模型中

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.24

Vladimir Gajinov, Srdjan Stipic, O. Unsal, T. Harris, E. Ayguadé, A. Cristal

{"title":"Integrating Dataflow Abstractions into the Shared Memory Model","authors":"Vladimir Gajinov, Srdjan Stipic, O. Unsal, T. Harris, E. Ayguadé, A. Cristal","doi":"10.1109/SBAC-PAD.2012.24","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2012.24","url":null,"abstract":"In this paper we present Atomic Dataflow model (ADF), a new task-based parallel programming model for C/C++ which integrates dataflow abstractions into the shared memory programming model. The ADF model provides pragma directives that allow a programmer to organize a program into a set of tasks and to explicitly define input data for each task. The task dependency information is conveyed to the ADF runtime system which constructs the dataflow task graph and builds the necessary infrastructure for dataflow execution. Additionally, the ADF model allows tasks to share data. The key idea is that computation is triggered by dataflow between tasks but that, within a task, execution occurs by making atomic updates to common mutable state. To that end, the ADF model employs transactional memory which guarantees atomicity of shared memory updates. We show examples that illustrate how the programmability of shared memory can be improved using the ADF model. Moreover, our evaluation shows that the ADF model performs well in comparison with programs parallelized using OpenMP and transactional memory.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134066858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems 高性能计算系统容错协议的能效评估

2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing Pub Date : 2012-10-24 DOI: 10.1109/SBAC-PAD.2012.12

Esteban Meneses, O. Sarood, L. Kalé

{"title":"Assessing Energy Efficiency of Fault Tolerance Protocols for HPC Systems","authors":"Esteban Meneses, O. Sarood, L. Kalé","doi":"10.1109/SBAC-PAD.2012.12","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2012.12","url":null,"abstract":"An exascale machine is expected to be delivered in the time frame 2018-2020. Such a machine will be able to tackle some of the hardest computational problems and to extend our understanding of Nature and the universe. However, to make that a reality, the HPC community has to solve a few important challenges. Resilience will become a prominent problem because an exascale machine will experience frequent failures due to the large amount of components it will encompass. Some form of fault tolerance has to be incorporated in the system to maintain the progress rate of applications as high as possible. In parallel, the system will have to be more careful about power management. There are two dimensions of power. First, in a power-limited environment, all the layers of the system have to adhere to that limitation (including the fault tolerance layer). Second, power will be relevant due to energy consumption: an exascale installation will have to pay a large energy bill. It is fundamental to increase our understanding of the energy profile of different fault tolerance schemes. This paper presents an evaluation of three different fault tolerance approaches: checkpoint/restart, message-logging and parallel recovery. Using programs from different programming models, we show parallel recovery is the most energy-efficient solution for an execution with failures. At the same time, parallel recovery is able to finish the execution faster than the other approaches. We explore the behavior of these approaches at extreme scales using an analytical model. At large scale, parallel recovery is predicted to reduce the total execution time of an application by 17% and reduce the energy consumption by 13% when compared to checkpoint/restart.","PeriodicalId":232444,"journal":{"name":"2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131993281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 47