2010 39th International Conference on Parallel Processing最新文献_第6页

Handling Conflicts with Compiler's Help in Software Transactional Memory Systems 在软件事务性内存系统中编译器的帮助下处理冲突

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.56

Sandya Mannarswamy, R. Govindarajan

{"title":"Handling Conflicts with Compiler's Help in Software Transactional Memory Systems","authors":"Sandya Mannarswamy, R. Govindarajan","doi":"10.1109/ICPP.2010.56","DOIUrl":"https://doi.org/10.1109/ICPP.2010.56","url":null,"abstract":"Atomic sections are supported in software through the use of optimistic concurrency by using Software Transactional Memory (STM). However STM implementations incur high overheads which reduce the wide-spread use of this approach by programmers. Conflicts are a major source of overheads in STMs. The basic performance premise of a transactional memory system is the optimistic concurrency principle wherein data updates executed by the transactions are to disjoint objects/memory locations, referred to as Disjoint Access Parallel (DAP). Otherwise, the updates conflict, and all but one of the transactions are aborted. Such aborts result in wasted work and performance degradation. While contention management systems in STM implementations try to reduce conflicts by various runtime feedback control mechanisms, they are not aware of the application’s structure and data access patterns and hence typically act after the conflicts have occurred. In this paper we propose a scheme based on compiler analysis, which can identify static atomic sections whose instances, when executed concurrently by more than one thread always conflict. Such an atomic section is referred to as Always Conflicting Atomic Section (ACAS). We propose and evaluate two techniques Selective Pessimistic Concurrency Control (SPCC) and compiler inserted Early Conflict Checks (ECC) which can help reduce the STM overheads caused by ACAS. We show that these techniques help reduce the aborts in 4 of the STAMP benchmarks by up to 27.52% while improving performance by 1.24% to 19.31%.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126684772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Gray-Box Feedback Control Approach for System-Level Peak Power Management 系统级峰值功率管理的灰盒反馈控制方法

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.63

Jiayu Gong, Chengzhong Xu

{"title":"A Gray-Box Feedback Control Approach for System-Level Peak Power Management","authors":"Jiayu Gong, Chengzhong Xu","doi":"10.1109/ICPP.2010.63","DOIUrl":"https://doi.org/10.1109/ICPP.2010.63","url":null,"abstract":"Power consumption has become one of the most important design considerations for modern high density servers. To avoid system failures caused by power capacity overload or overheating, system-level power management is required. This kind of management needs to control power consumption precisely. Conventional solutions to this problem mostly rely on feedback controllers which only concern the power itself, known as black-box approaches. They may not respond to the variation of system quickly. This paper presents a gray-box strategy to design a model-predictive feedback controller based on a pre-built power model and a performance prediction model to constraint the peak power consumption of a server. In contrast to the existing strategies, this gray-box approach uses the performance events, which bring more insights of the behaviors and power consumption of a system, for the purpose of model prediction. We implemented a prototype of this controller and evaluated it using SPECweb2005 benchmark on a web server. This controller can settle the power consumption below the power cap within 2 control periods for more than 75% of the power overloading regardless of workload variations, outperforming black-box approaches. Meanwhile, the performance of application can be maximized with this controller.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"545 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132446287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Heterogeneous Mini-rank: Adaptive, Power-Efficient Memory Architecture 异构的迷你等级:自适应的，节能的内存架构

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.11

Kun Fang, Hongzhong Zheng, Zhichun Zhu

{"title":"Heterogeneous Mini-rank: Adaptive, Power-Efficient Memory Architecture","authors":"Kun Fang, Hongzhong Zheng, Zhichun Zhu","doi":"10.1109/ICPP.2010.11","DOIUrl":"https://doi.org/10.1109/ICPP.2010.11","url":null,"abstract":"Memory power consumption has become a big concern in server platforms. A recently proposed mini-rank architecture reduces the memory power consumption by breaking each DRAM rank into multiple narrow mini-ranks and activating fewer devices for each request. However, its fixed and uniform configuration may degrade performance significantly or lose power saving opportunities on some workloads. We propose a heterogeneous mini-rank design that sets the near-optimal configuration for each workload based on its memory access behavior and its memory bandwidth requirement. Compared with the original, homogeneous mini-rank design, the heterogeneous mini-rank design can balance between the performance and power saving and avoid large performance loss. For instance, for multiprogramming workloads with SPEC2000 application running on a quad-core system with two-channel DDR3-1066 memory, on average, the heterogeneous mini-rank can reduce the memory power by 53.1% (up to 60.8%) with the performance loss of 4.6% (up to 11.1%), compared with a conventional memory system. In comparison, the x32 homogeneous mini-rank can only save memory power by up to 29.8%; and the x8 homogeneous mini-rank will cause performance loss by up to 22.8%. Compared with x16 homogeneous mini-rank configuration, it can further reduce the EDP (energy-delay product) by up to 15.5% (10.0% on average).","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"50 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114027732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection 大规模蛋白质序列同源性检测的可扩展并行算法

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.41

Changjun Wu, A. Kalyanaraman, W. Cannon

{"title":"A Scalable Parallel Algorithm for Large-Scale Protein Sequence Homology Detection","authors":"Changjun Wu, A. Kalyanaraman, W. Cannon","doi":"10.1109/ICPP.2010.41","DOIUrl":"https://doi.org/10.1109/ICPP.2010.41","url":null,"abstract":"Protein sequence homology detection is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting homology between two protein sequences is computationally inexpensive, detecting pairwise homology at a large-scale becomes prohibitive, requiring millions of CPU hours. Yet, there is currently no efficient method available to parallelize this kernel. In this paper, we present the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for large-scale protein sequence data. Our method, called pGraph, is designed using a hierarchical multiple-master multiple-worker model, where the processor space is partitioned into subgroups and the hierarchy helps in ensuring the workload is load balanced fashion despite the inherent irregularity that may originate in the input. Experimental evaluation demonstrates that our method scales linearly on all input sizes tested (up to 640K sequences) on a 1,024 node supercomputer. In addition to demonstrating strong scaling, we present an extensive study of the various components of the system and related parametric studies.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122318280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions 顶点图合成探针与重用距离分布关系的表征

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.43

K. Ibrahim, E. Strohmaier

{"title":"Characterizing the Relation Between Apex-Map Synthetic Probes and Reuse Distance Distributions","authors":"K. Ibrahim, E. Strohmaier","doi":"10.1109/ICPP.2010.43","DOIUrl":"https://doi.org/10.1109/ICPP.2010.43","url":null,"abstract":"Characterizing a memory reference stream using reuse distance distribution can enable predicting the performance on a given architecture. Benchmarks can subject an architecture to a limited set of reuse distance distributions, but it cannot exhaustively test it. In contrast, Apex-Map, a synthetic memory probe with parameterized locality, can provide a better coverage of the machine use scenarios. Unfortunately, it requires a lot of expertise to relate an application memory behavior to an Apex-Map parameter set. In this work we present a mathematical formulation that describes the relation between Apex-Map and reuse distance distributions. We also introduce a process through which we can automate the estimation of Apex-Map locality parameters for a given application. This process finds the best parameters for Apex-Map probes that generate a reuse distance distribution similar to that of the original application. We tested this scheme on benchmarks from Scalable Synthetic Compact Applications and Unbalanced Tree Search, and we show that this scheme provides an accurate Apex-Map parameterization with a small percentage of mismatch in reuse distance distributions, about 3% in average and less than 8% in the worst case, on the tested applications.","PeriodicalId":180554,"journal":{"name":"2010 39th International Conference on Parallel Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114721559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

A Lightweight, GPU-Based Software RAID System 一个轻量级的，基于gpu的软件RAID系统

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.64

M. Curry, L. Ward, A. Skjellum, R. Brightwell

引用次数: 18

Towards Building Efficient Content-Based Publish/Subscribe Systems over Structured P2P Overlays 在结构化的P2P覆盖上构建高效的基于内容的发布/订阅系统

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.33

S. Zhang, Ji Wang, Rui Shen, Jie Xu

引用次数: 10

Power Management in Heterogeneous Multi-tier Web Clusters 异构多层Web集群中的电源管理

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.46

Peijian Wang, Yongwei Qi, Xue Liu, Ying Chen, Xiao Zhong

引用次数: 18

Dual-Phase Just-in-Time Workflow Scheduling in P2P Grid Systems P2P网格系统中的双阶段准时工作流调度

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.31

S. Di, Cho-Li Wang

引用次数: 4

SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup SAM:用于云备份的语义感知的多层源重复数据删除框架

2010 39th International Conference on Parallel Processing Pub Date : 2010-09-13 DOI: 10.1109/ICPP.2010.69

Yujuan Tan, Hong Jiang, D. Feng, Lei Tian, Zhichao Yan, Guohui Zhou

引用次数: 87