20th Annual International Conference on High Performance Computing最新文献

MIL: A language to build program analysis tools through static binary instrumentation MIL:一种通过静态二进制工具构建程序分析工具的语言

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-18 DOI: 10.1109/HiPC.2013.6799106

Andres Charif Rubial, Denis Barthou, Cédric Valensi, S. Shende, A. Malony, W. Jalby

引用次数: 17

Transaction scheduling using conflict avoidance and Contention Intensity 使用冲突避免和争用强度的事务调度

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799126

M. Pereira, A. Baldassin, G. Araújo, L. E. Buzato

{"title":"Transaction scheduling using conflict avoidance and Contention Intensity","authors":"M. Pereira, A. Baldassin, G. Araújo, L. E. Buzato","doi":"10.1109/HiPC.2013.6799126","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799126","url":null,"abstract":"In the last few years, Transactional Memories (TMs) have been shown to be a parallel programming model that can effectively combine performance improvement with ease of programming. Moreover, the recent introduction of TM-based ISA extensions, by major microprocessor manufacturers, also seems to endorse TM as a programming model for today's parallel applications. One of the central issues in designing Software TM (STM) systems is to identify mechanisms/heuristics that can minimize contention arising from conflicting transactions. Although a number of mechanisms have been proposed to tackle contention, such techniques have a limited scope, as conflict is avoided by either interrupting or serializing transaction execution, thus considerably impacting performance. To deal with this limitation, we have proposed a new effective transaction scheduler, along with a conflict-avoidance heuristic, that implements a fully cooperative scheduler that switches a conflicting transaction by another with a lower conflicting probability. This paper extends such framework and introduces a new heuristic, built from the combination of our previous conflict avoidance technique with the Contention Intensity heuristic proposed by Yoo and Lee. Experimental results, obtained using the STMBench7 and STAMP benchmarks atop tinySTM, show that the proposed heuristic produces significant speedups when compared to other four solutions.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"115 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126926612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multi-tier energy buffering management for IDCs with heterogeneous energy storage devices 异构储能idc的多层能量缓冲管理

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799104

Z. Abbasi, Madhurima Pore, Ayan Banerjee, S. Gupta

{"title":"Multi-tier energy buffering management for IDCs with heterogeneous energy storage devices","authors":"Z. Abbasi, Madhurima Pore, Ayan Banerjee, S. Gupta","doi":"10.1109/HiPC.2013.6799104","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799104","url":null,"abstract":"Energy buffering, has been proposed to store renewable energy and low cost electricity in Energy Storage Devices (ESDs) and use it judiciously to reduce electricity bill in Internet data centers. Recent research have considered long term variation in electricity price, renewable power and workload and have shown the efficiency of energy buffering in reducing electricity bill. However, these aspects of data centers exhibit both long and short term variation. Further, there is inherent heterogeneity in ESD physical characteristics (e.g., charging and discharging rates). We hypothesize that a multi-tier energy buffering management can leverage the heterogeneity in ESD characteristics and better optimize utilization of renewable energy and low-cost power in presence of both short and long term variabilities in a data center. This paper proposes an analytical study of multi-tier workload and energy buffering management technique that frames each tier as an optimization problem and solves them in an online and proactive way using Receding Horizon Control (RHC). Our study shows that multi-tier energy buffering management increases the utilization of the renewables by upto two times compared to one-tier management.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122379810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

LiPS: A cost-efficient data and task co-scheduler for MapReduce LiPS:用于MapReduce的经济高效的数据和任务协同调度程序

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799103

M. Ehsan, Yao Chen, Hui Kang, R. Sion, Jennifer L. Wong

引用次数: 5

A hybrid parallelization approach for high resolution operational flood forecasting 高分辨率业务洪水预报的混合并行化方法

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799142

Swati Singhal, L. V. Real, Thomas George, Sandhya Aneja, Yogish Sabharwal

{"title":"A hybrid parallelization approach for high resolution operational flood forecasting","authors":"Swati Singhal, L. V. Real, Thomas George, Sandhya Aneja, Yogish Sabharwal","doi":"10.1109/HiPC.2013.6799142","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799142","url":null,"abstract":"Accurate and timely flood forecasts are becoming highly essential due to the increased incidence of flood related disasters over the last few years. Such forecasts require a high resolution integrated flood modeling approach. In this paper, we present an integrated flood forecasting system with an automated workflow over the weather modeling, surface runoff estimation and water routing components. We primarily focus on the water routing process which is the most compute intensive phase and present two parallelization strategies to scale it up to large grid sizes. Specifically, we employ nature-inspired decomposition of a simulation domain into watershed basins and propose a master slave model of parallelization for distributed processing of the basins. We also propose an intra-basin shared memory parallelization approach using OpenMP. Empirical evaluation of the proposed parallelization strategies indicates a potential for high speedups for certain types of scenarios (e.g., speedup of 13× with 16 threads using OpenMP parallelization for the large Rio de Janeiro basin).","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124615938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Speculative dynamic vectorization to assist static vectorization in a HW/SW co-designed environment 在硬件/软件协同设计的环境中，推测动态矢量化可以辅助静态矢量化

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799102

Rakesh Kumar, Alejandro Martínez, Antonio González

{"title":"Speculative dynamic vectorization to assist static vectorization in a HW/SW co-designed environment","authors":"Rakesh Kumar, Alejandro Martínez, Antonio González","doi":"10.1109/HiPC.2013.6799102","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799102","url":null,"abstract":"Compiler based static vectorization is used widely to extract data level parallelism from computation intensive applications. Static vectorization is very effective in vectorizing traditional array based applications. However, compilers inability to reorder ambiguous memory references severely limits vectorization opportunities, especially in pointer rich applications. HW/SW co-designed processors provide an excellent opportunity to optimize the applications at runtime. The availability of dynamic application behavior at runtime will help in capturing vectorization opportunities generally missed by the compilers. This paper proposes to complement the static vectorization with a speculative dynamic vectorizer in a HW/SW co-design processor. We present a speculative dynamic vectorization algorithm that speculatively reorders ambiguous memory references to uncover vectorization opportunities. The hardware checks for any memory dependence violation due to speculative vectorization and takes corrective action in case of violation. Our experiments show that the combined (static + dynamic) vectorization approach provides 2x performance benefit compared to the static vectorization alone, for SPECFP2006. Moreover, dynamic vectorization scheme is as effective in vectorization of pointer-based applications as for the array-based ones, whereas compilers lose significant vectorization opportunities in pointer-based applications.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133832163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Loop level speculation in a task based programming model 基于任务的编程模型中的循环级推测

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799132

Rahulkumar Gayatri, Rosa M. Badia, E. Ayguadé

{"title":"Loop level speculation in a task based programming model","authors":"Rahulkumar Gayatri, Rosa M. Badia, E. Ayguadé","doi":"10.1109/HiPC.2013.6799132","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799132","url":null,"abstract":"Uncountable loops (such as while loops in C) and if-conditions are some of the most common constructs in programming. While-loops are widely used to determine the convergence in linear algebra algorithms or goal finding problems from graph algorithms, to name a few. In general while-loops are used whenever the loop iteration space, the number of iterations a loop executes is unknown. Usually in while-loops, the execution of the next iteration is decided inside the current loop iteration (i.e. the execution of iteration i depends on the values computed in iteration i-1). This precludes their parallel execution in today's ubiquitous multi-core architectures. In this paper a technique to speculatively create parallel tasks from the next iterations before the current one completes is proposed. If consecutive loop-iterations are only control dependent, then multiple iterations can be executed simultaneously; later in the execution path, the runtime system will decide to either commit the results of such speculatively executed iterations or undo the changes made by them. Data dependences within or between non-speculative and speculative work are honored to guarantee correctness. The proposed technique is implemented in SMPSs, a task-based dataflow programming model for shared-memory multiprocessor architectures. The approach is evaluated on a set of applications from graph algorithms and linear algebra. Results are promising with an average increase in the speedup of 1.2x with 16 threads when compared to non speculative execution of the applications. The increase in the speedup is significant, since the performance gain is achieved over an already parallelized version of the benchmarks.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130845948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Conflict-free data access for multi-bank memory architectures using padding 使用填充的多银行内存架构的无冲突数据访问

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799112

Joar Sohl, Jian Wang, Andreas Karlsson, Dake Liu

引用次数: 5

Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems 在异构CPU-GPU系统上加速基于包容的指针分析

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799110

Yu Su, Ding Ye, Jingling Xue

{"title":"Accelerating inclusion-based pointer analysis on heterogeneous CPU-GPU systems","authors":"Yu Su, Ding Ye, Jingling Xue","doi":"10.1109/HiPC.2013.6799110","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799110","url":null,"abstract":"This paper describes the first implementation of Andersen's inclusion-based pointer analysis for C programs on a heterogeneous CPU-GPU system, where both its CPU and GPU cores are used. As an important graph algorithm, Andersen's analysis is difficult to parallelise because it makes extensive modifications to the structure of the underlying graph, in a way that is highly input-dependent and statically hard to analyse. Existing parallel solutions run on either the CPU or GPU but not both, rendering the underlying computational resources underutilised and the ratios of CPU-only over GPU-only speedups for certain programs (i.e., graphs) unpredictable. We observe that a naive parallel solution of Andersen's analysis on a CPU-GPU system suffers from poor performance due to workload imbalance. We introduce a solution that is centered around a new dynamic workload distribution scheme. The novelty lies in prioritising the distribution of different types of workloads, i.e., graph-rewriting rules in Andersen's analysis to CPU or GPU according to the degrees of the processing unit's suitability for processing them. This scheme is effective when combined with synchronisation-free execution of tasks (i.e., graph-rewriting rules) and difference propagation of points-to information between the CPU and GPU. For a set of seven C benchmarks evaluated, our CPU-GPU solution outperforms (on average) (1) the CPU-only solution by 50.6%, (2) the GPU-only solution by 78.5%, and (3) an oracle solution that behaves as the faster of (1) and (2) on every benchmark by 34.6%.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125134462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Share-o-meter: An empirical analysis of KSM based memory sharing in virtualized systems Share-o-meter:虚拟系统中基于KSM的内存共享的实证分析

20th Annual International Conference on High Performance Computing Pub Date : 2013-12-01 DOI: 10.1109/HiPC.2013.6799096

Shashank Rachamalla, Debadatta Mishra, Purushottam Kulkarni

{"title":"Share-o-meter: An empirical analysis of KSM based memory sharing in virtualized systems","authors":"Shashank Rachamalla, Debadatta Mishra, Purushottam Kulkarni","doi":"10.1109/HiPC.2013.6799096","DOIUrl":"https://doi.org/10.1109/HiPC.2013.6799096","url":null,"abstract":"Content based memory sharing in virtualized environments has proven to be a useful technique for over-commitment based placement of virtual machines. Kernel-based Virtual Machine (KVM) on Linux uses Kernel SamePage Merging (KSM) to identify and exploit sharing opportunities. In this paper, we present an analysis of page sharing across virtual machines by comparing page sharing achieved by KSM to total sharing opportunities presented by virtual machines. We study the impact of different KSM configurations, system resources, and workload characteristics on page sharing achieved by KSM. We also study the cost of sharing in terms of CPU utilization overhead from Copy-On-Write page breaks that occur on KSM shared pages. Our analysis is aimed at exploring the KSM configuration space towards obtaining desired sharing levels with minimal overheads for a given amount of system resources and workload characteristics. Our empirical analysis shows that for workloads exhibiting different memory usage patterns, different KSM configuration parameters are required to achieve maximum savings. We quantify the levels of savings and associated costs for several (individual and combinations) of workloads, exhibiting different sharing opportunities and memory usage characteristics. Further, we demonstrate the need for adaptive configuration of KSM's aggressiveness based on changes in total memory available for sharing and change in memory usage characteristics.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126216426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13