2014 43rd International Conference on Parallel Processing最新文献_第4页

A Compiler Extension for Parallel Matrix Programming 并行矩阵规划的编译器扩展

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.56

Kevin Williams, Matthew Le, Ted Kaminski, E. V. Wyk

{"title":"A Compiler Extension for Parallel Matrix Programming","authors":"Kevin Williams, Matthew Le, Ted Kaminski, E. V. Wyk","doi":"10.1109/ICPP.2014.56","DOIUrl":"https://doi.org/10.1109/ICPP.2014.56","url":null,"abstract":"This paper describes a compiler extension to our prototype extensible C translator that adds new features for parallel execution of matrix operations and shows their application to problems in spatio-temporal data mining. The extension provides new language features for constructing new matrices, mapping functions over elements of a matrix, and accumulating operations that, for example, can sum values in a matrix. It also provides the appropriate semantic analysis to check for errors before translating the constructs down to parallel C code. The extension also provides features that let the programmer indicate how the extension translates these matrix constructs down to C code. Programmers seeking higher levels of performance can specify how the underlying for-loops are structured so that code using, for example, loop-tiling techniques or vector processors, is generated. In general, compiler extensions supported by our approach allow new domain-specific syntax and semantic analyses to be easily added to the host language. Specifications of the host C language and the extensions are composed to create a custom translator that maps extended C programs down to plain (parallel) C code, checking for domain-specific errors and applying high-level domain-specific optimizations in the process.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Batch System with Fair Scheduling for Evolving Applications 演化应用的公平调度批处理系统

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.44

Suraj Prabhakaran, Mohsin Iqbal, S. Rinke, Christian Windisch, F. Wolf

{"title":"A Batch System with Fair Scheduling for Evolving Applications","authors":"Suraj Prabhakaran, Mohsin Iqbal, S. Rinke, Christian Windisch, F. Wolf","doi":"10.1109/ICPP.2014.44","DOIUrl":"https://doi.org/10.1109/ICPP.2014.44","url":null,"abstract":"Cluster batch systems usually support only static allocation of resources to applications before job start. After job start, applications cannot increase or decrease their resource set. However, some applications unpredictably evolve during execution and thus may require additional resources. If the extra resources cannot be delivered during runtime, those applications may have to run longer to finish, or are not even able to finish when their job's time slice expires. Likewise, a job may have to end without additional resources due to hardware limits being reached, such as the memory available to the compute node. To avoid such scenarios, users have to make large static allocations to account for a potential demand for resources. This leads to wastage of resources as they idle before they might actually be used at an unknown point. In this paper, we propose a batch system with dynamic allocation facilities to support on-the-fly resource allocation to unpredictably evolving jobs based on demand. We present a novel dynamic resource allocation strategy that also accounts for a fair assignment of resources between the usual rigid jobs and the evolving jobs. The results for a CFD production application and a mixed workload of rigid and evolving jobs (based on the widely used ESP benchmark) show that our system not only reduces the job waiting and job turnaround times, but also increases system utilization and system throughput.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"12 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116590303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Fast Biological Sequence Comparison on Hybrid Platforms 混合平台生物序列快速比对

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.59

S. Kedad-Sidhoum, F. Mendonca, Florence Monna, G. Mounié, D. Trystram

引用次数: 5

CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data 流时间序列数据中子序列的cuda加速对齐

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.10

Christian Hundt, B. Schmidt, E. Schömer

{"title":"CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data","authors":"Christian Hundt, B. Schmidt, E. Schömer","doi":"10.1109/ICPP.2014.10","DOIUrl":"https://doi.org/10.1109/ICPP.2014.10","url":null,"abstract":"Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive implementation with quadratic time complexity which allows for more efficient processing of long queries. The DTW implementation makes extensive use of a lower-bound cascade to avoid expensive calculations for unpromising candidates. Our CUDA-parallelizations for both ED and DTW outperform state-of-the-art algorithms, namely the UCR-Suite. The gained speedups range from one to two orders-of-magnitude which allows for significantly faster processing of exceedingly bigger data streams.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133111512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Framework for Data Protection in Cloud Federations 云联盟中的数据保护框架

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.37

Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu

引用次数: 9

Analysis and Design of Fault-Tolerant Scheduling for Real-Time Tasks on Earth-Observation Satellites 对地观测卫星实时任务容错调度分析与设计

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.58

Xiaomin Zhu, Jianjiang Wang, Ji Wang, X. Qin

{"title":"Analysis and Design of Fault-Tolerant Scheduling for Real-Time Tasks on Earth-Observation Satellites","authors":"Xiaomin Zhu, Jianjiang Wang, Ji Wang, X. Qin","doi":"10.1109/ICPP.2014.58","DOIUrl":"https://doi.org/10.1109/ICPP.2014.58","url":null,"abstract":"Fault-tolerant scheduling is an efficient approach to improving the reliability of multiple earth-observing satellites especially in some emergent scenarios such as obtaining photographs on battlefields or earthquake areas. Unfortunately, little work has been done to deal with the fault-tolerant scheduling on satellites. To address this issue, this paper presents a novel dynamic fault-tolerant scheduling model using primary-backup policy to tolerate one satellite's permanent failure at one time instant. On this basis, we propose a novel fault-tolerant satellite scheduling algorithm named FTSS, in which an overlapping technology is adopted to improve the resource utilization. Besides, the FTSS employs the task merging strategies to further enhance the schedulability. To demonstrate the superiority of our FTSS, we conduct extensive experiments by simulations using real-world satellite parameters from STK to compare FTSS with other baseline algorithms. The experimental results indicate that FTSS efficiently improves the scheduling quality of others and is suitable for fault-tolerant satellite scheduling.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124155957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

NEO: A Nonblocking Hybrid Switch Architecture for Large Scale Data Centers NEO:用于大规模数据中心的非阻塞混合交换架构

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.60

Zhemin Zhang, Yuanyuan Yang

{"title":"NEO: A Nonblocking Hybrid Switch Architecture for Large Scale Data Centers","authors":"Zhemin Zhang, Yuanyuan Yang","doi":"10.1109/ICPP.2014.60","DOIUrl":"https://doi.org/10.1109/ICPP.2014.60","url":null,"abstract":"As the scale of data centers and cloud computing applications increases, data center networks play a critical role in meeting the huge communication bandwidth requirement of such applications. The scalability of conventional electronic data center networks is limited by wiring complexity and reaching distance of links under fixed power budget. To overcome this problem, in this paper we propose a nonblocking hybrid switch architecture, called NEO (Nonblocking Electronic and Optical), which is able to provide nonblocking interconnections for as many as 1,000,000 servers in a data center. NEO maintains electronic interconnections for intra-pod networks, while providing interpod interconnections by optical core switches, which not only increases the scalability of the switch architecture, but also lowers the switch cost and power consumption compared to other existing optical switch architectures. We also design a packet scheduler for NEO, which adopts a credit flow control mechanism and a parallel scheduling algorithm to avoid packet loss, and provide low communication latency. Our simulation results demonstrate that NEO achieves very low average packet delay compared to other existing optical switching architectures under various traffic patterns.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"19 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123110077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Modeling the Energy Efficiency of Heterogeneous Clusters 异构集群的能源效率建模

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.41

Lavanya Ramapantulu, B. Tudor, Dumitrel Loghin, Trang Vu, Y. M. Teo

引用次数: 10

Software-Managed Power Reduction in Infiniband Links 软件管理的ib链路功耗降低

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.40

Branimir Dickov, M. Pericàs, P. Carpenter, N. Navarro, E. Ayguadé

{"title":"Software-Managed Power Reduction in Infiniband Links","authors":"Branimir Dickov, M. Pericàs, P. Carpenter, N. Navarro, E. Ayguadé","doi":"10.1109/ICPP.2014.40","DOIUrl":"https://doi.org/10.1109/ICPP.2014.40","url":null,"abstract":"The backbone of a large-scale supercomputer is the interconnection network. As compute nodes become more energy-efficient, the interconnect is accounting for an increasing proportion of the total system energy consumption. The interconnect's energy consumption is, however, only starting to receive serious attention. Some hardware-based schemes have been proposed that exploit idle periods or low utilisation, either by turning off the links or by lowering the frequency and voltage. Although these schemes are effective in certain cases, they do not have enough global information about the application's communication behaviour to efficiently manage the network power consumption. This paper proposes an alternative approach: moving the intelligence into the PMPI layer of the MPI library, and using prediction to discover repetitive patterns in the application's communication behaviour. The core of the prediction algorithm is an n-gram extraction technique, which can accurately predict not only when a link will become unused but also when it will become active again, allowing lanes to be switched off during the idle periods and switched back on again in time to avoid incurring a significant performance degradation. Many HPC applications benefit from prediction, since they have repetitive computation and communication phases. By implementing the energy-saving mechanism inside the MPI library, existing MPI programs do not need to be modified. Using an event-driven simulator, driven by representative HPC workloads, we demonstrate average energy savings in Infiniband switches up to around 33%, while the average execution time increase is only up to 1%.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132789907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling 动态事务调度中衡量有效工作以奖励成功

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.23

M. Pereira, J. N. Amaral, G. Araújo

{"title":"Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling","authors":"M. Pereira, J. N. Amaral, G. Araújo","doi":"10.1109/ICPP.2014.23","DOIUrl":"https://doi.org/10.1109/ICPP.2014.23","url":null,"abstract":"One of the greatest challenges of modern computing is the development of software optimized for parallel execution in multi-core processors. Transactional Memory (TM) is a new trend in concurrency control that has emerged to address these challenges. TM promises the performance of finer grain locks combined with lower programming complexity. However, transactional memories are speculative and rely on contention managers to resolve conflicts between transactions. This paper explores a complementary approach to boost the performance of TM through the use of schedulers. A TM scheduler is a software component that decides when a particular transaction should be executed. TM scheduling mechanisms are typically restricted to either serialization or yielding. Moreover, their effectiveness is very sensitive to the accuracy of the metric used to predict transaction behavior, particularly in high-contention scenarios. This paper proposes a new Dynamic Transaction Scheduler (DTS) to select a transaction to execute next, based on a new policy that rewards success and uses an improved metric that measures the amount of effective work performed by a transaction. An experimental evaluation indicates that scheduling transactions based on DTS can provide good average-case performance.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"45 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114057577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1