2014 43rd International Conference on Parallel Processing最新文献

筛选
英文 中文
A Compiler Extension for Parallel Matrix Programming 并行矩阵规划的编译器扩展
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.56
Kevin Williams, Matthew Le, Ted Kaminski, E. V. Wyk
{"title":"A Compiler Extension for Parallel Matrix Programming","authors":"Kevin Williams, Matthew Le, Ted Kaminski, E. V. Wyk","doi":"10.1109/ICPP.2014.56","DOIUrl":"https://doi.org/10.1109/ICPP.2014.56","url":null,"abstract":"This paper describes a compiler extension to our prototype extensible C translator that adds new features for parallel execution of matrix operations and shows their application to problems in spatio-temporal data mining. The extension provides new language features for constructing new matrices, mapping functions over elements of a matrix, and accumulating operations that, for example, can sum values in a matrix. It also provides the appropriate semantic analysis to check for errors before translating the constructs down to parallel C code. The extension also provides features that let the programmer indicate how the extension translates these matrix constructs down to C code. Programmers seeking higher levels of performance can specify how the underlying for-loops are structured so that code using, for example, loop-tiling techniques or vector processors, is generated. In general, compiler extensions supported by our approach allow new domain-specific syntax and semantic analyses to be easily added to the host language. Specifications of the host C language and the extensions are composed to create a custom translator that maps extended C programs down to plain (parallel) C code, checking for domain-specific errors and applying high-level domain-specific optimizations in the process.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121312582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Batch System with Fair Scheduling for Evolving Applications 演化应用的公平调度批处理系统
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.44
Suraj Prabhakaran, Mohsin Iqbal, S. Rinke, Christian Windisch, F. Wolf
{"title":"A Batch System with Fair Scheduling for Evolving Applications","authors":"Suraj Prabhakaran, Mohsin Iqbal, S. Rinke, Christian Windisch, F. Wolf","doi":"10.1109/ICPP.2014.44","DOIUrl":"https://doi.org/10.1109/ICPP.2014.44","url":null,"abstract":"Cluster batch systems usually support only static allocation of resources to applications before job start. After job start, applications cannot increase or decrease their resource set. However, some applications unpredictably evolve during execution and thus may require additional resources. If the extra resources cannot be delivered during runtime, those applications may have to run longer to finish, or are not even able to finish when their job's time slice expires. Likewise, a job may have to end without additional resources due to hardware limits being reached, such as the memory available to the compute node. To avoid such scenarios, users have to make large static allocations to account for a potential demand for resources. This leads to wastage of resources as they idle before they might actually be used at an unknown point. In this paper, we propose a batch system with dynamic allocation facilities to support on-the-fly resource allocation to unpredictably evolving jobs based on demand. We present a novel dynamic resource allocation strategy that also accounts for a fair assignment of resources between the usual rigid jobs and the evolving jobs. The results for a CFD production application and a mixed workload of rigid and evolving jobs (based on the widely used ESP benchmark) show that our system not only reduces the job waiting and job turnaround times, but also increases system utilization and system throughput.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"12 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116590303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Fast Biological Sequence Comparison on Hybrid Platforms 混合平台生物序列快速比对
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.59
S. Kedad-Sidhoum, F. Mendonca, Florence Monna, G. Mounié, D. Trystram
{"title":"Fast Biological Sequence Comparison on Hybrid Platforms","authors":"S. Kedad-Sidhoum, F. Mendonca, Florence Monna, G. Mounié, D. Trystram","doi":"10.1109/ICPP.2014.59","DOIUrl":"https://doi.org/10.1109/ICPP.2014.59","url":null,"abstract":"Today, many high performance computing platforms use hybrid architectures combining multi-core processors and hardware accelerators like GPUs (Graphic Processing Units). This paper presents a new method for scheduling tasks for biological sequence comparison applications with CPUs and GPUs. This strategy is called SWDUAL and is based on a dual approximation scheme for determining which tasks are most suitable to be executed on the GPUs. The objective is to obtain fast execution time and minimize the idle time on each PE (Processing Element). It is implemented using a master-slave model. Results obtained when sequences were compared to five public genomic databases show that this method allows to reduce the execution time on hybrid platforms when compared to other public available implementations.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129516382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data 流时间序列数据中子序列的cuda加速对齐
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.10
Christian Hundt, B. Schmidt, E. Schömer
{"title":"CUDA-Accelerated Alignment of Subsequences in Streamed Time Series Data","authors":"Christian Hundt, B. Schmidt, E. Schömer","doi":"10.1109/ICPP.2014.10","DOIUrl":"https://doi.org/10.1109/ICPP.2014.10","url":null,"abstract":"Euclidean Distance (ED) and Dynamic Time Warping (DTW) are cornerstones in the field of time series data mining. Many high-level algorithms like kNN-classification, clustering or anomaly detection make excessive use of these distance measures as subroutines. Furthermore, the vast growth of recorded data produced by automated monitoring systems or integrated sensors establishes the need for efficient implementations. In this paper, we introduce linear memory parallelization schemes for the alignment of a given query Q in a stream of time series data S for both ED and DTW using CUDA-enabled accelerators. The ED parallelization features a log-linear calculation scheme in contrast to the naive implementation with quadratic time complexity which allows for more efficient processing of long queries. The DTW implementation makes extensive use of a lower-bound cascade to avoid expensive calculations for unpromising candidates. Our CUDA-parallelizations for both ED and DTW outperform state-of-the-art algorithms, namely the UCR-Suite. The gained speedups range from one to two orders-of-magnitude which allows for significantly faster processing of exceedingly bigger data streams.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133111512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
A Framework for Data Protection in Cloud Federations 云联盟中的数据保护框架
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.37
Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu
{"title":"A Framework for Data Protection in Cloud Federations","authors":"Lena Mashayekhy, Mahyar Movahed Nejad, Daniel Grosu","doi":"10.1109/ICPP.2014.37","DOIUrl":"https://doi.org/10.1109/ICPP.2014.37","url":null,"abstract":"One of the benefits of cloud computing is that a cloud provider can dynamically scale-up its resource capabilities by forming a cloud federation with other cloud providers. Forming cloud federations requires taking the data privacy and security concerns into account, which is critical in satisfying the Service Level Agreements (SLAs). The nature of privacy and security challenges in clouds requires that cloud providers design data protection mechanisms that work together with their resource management systems. In this paper, we consider the privacy requirements when outsourcing data and computation within a federation of clouds, and propose a framework for minimizing the cost of outsourcing while considering two key data protection restrictions, the trust and disclosure restrictions. We model these restrictions as conflict graphs, and formulate the problem as an integer program. In the absence of computationally tractable optimal algorithms for solving this problem, we design a fast heuristic algorithm. We analyze the performance of our proposed algorithm through extensive experiments.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126015653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Analysis and Design of Fault-Tolerant Scheduling for Real-Time Tasks on Earth-Observation Satellites 对地观测卫星实时任务容错调度分析与设计
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.58
Xiaomin Zhu, Jianjiang Wang, Ji Wang, X. Qin
{"title":"Analysis and Design of Fault-Tolerant Scheduling for Real-Time Tasks on Earth-Observation Satellites","authors":"Xiaomin Zhu, Jianjiang Wang, Ji Wang, X. Qin","doi":"10.1109/ICPP.2014.58","DOIUrl":"https://doi.org/10.1109/ICPP.2014.58","url":null,"abstract":"Fault-tolerant scheduling is an efficient approach to improving the reliability of multiple earth-observing satellites especially in some emergent scenarios such as obtaining photographs on battlefields or earthquake areas. Unfortunately, little work has been done to deal with the fault-tolerant scheduling on satellites. To address this issue, this paper presents a novel dynamic fault-tolerant scheduling model using primary-backup policy to tolerate one satellite's permanent failure at one time instant. On this basis, we propose a novel fault-tolerant satellite scheduling algorithm named FTSS, in which an overlapping technology is adopted to improve the resource utilization. Besides, the FTSS employs the task merging strategies to further enhance the schedulability. To demonstrate the superiority of our FTSS, we conduct extensive experiments by simulations using real-world satellite parameters from STK to compare FTSS with other baseline algorithms. The experimental results indicate that FTSS efficiently improves the scheduling quality of others and is suitable for fault-tolerant satellite scheduling.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124155957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
NEO: A Nonblocking Hybrid Switch Architecture for Large Scale Data Centers NEO:用于大规模数据中心的非阻塞混合交换架构
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.60
Zhemin Zhang, Yuanyuan Yang
{"title":"NEO: A Nonblocking Hybrid Switch Architecture for Large Scale Data Centers","authors":"Zhemin Zhang, Yuanyuan Yang","doi":"10.1109/ICPP.2014.60","DOIUrl":"https://doi.org/10.1109/ICPP.2014.60","url":null,"abstract":"As the scale of data centers and cloud computing applications increases, data center networks play a critical role in meeting the huge communication bandwidth requirement of such applications. The scalability of conventional electronic data center networks is limited by wiring complexity and reaching distance of links under fixed power budget. To overcome this problem, in this paper we propose a nonblocking hybrid switch architecture, called NEO (Nonblocking Electronic and Optical), which is able to provide nonblocking interconnections for as many as 1,000,000 servers in a data center. NEO maintains electronic interconnections for intra-pod networks, while providing interpod interconnections by optical core switches, which not only increases the scalability of the switch architecture, but also lowers the switch cost and power consumption compared to other existing optical switch architectures. We also design a packet scheduler for NEO, which adopts a credit flow control mechanism and a parallel scheduling algorithm to avoid packet loss, and provide low communication latency. Our simulation results demonstrate that NEO achieves very low average packet delay compared to other existing optical switching architectures under various traffic patterns.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"19 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123110077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Modeling the Energy Efficiency of Heterogeneous Clusters 异构集群的能源效率建模
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.41
Lavanya Ramapantulu, B. Tudor, Dumitrel Loghin, Trang Vu, Y. M. Teo
{"title":"Modeling the Energy Efficiency of Heterogeneous Clusters","authors":"Lavanya Ramapantulu, B. Tudor, Dumitrel Loghin, Trang Vu, Y. M. Teo","doi":"10.1109/ICPP.2014.41","DOIUrl":"https://doi.org/10.1109/ICPP.2014.41","url":null,"abstract":"Traditional datacenter systems advocate the use of high-performance hardware, resulting in increased power consumption and cooling costs. With increasing availability of systems having diverse performance-to-power ratios, we analyze the energy efficiency of mixing high-performance and low-power nodes in a cluster. Using a model-driven analysis, we predict the heterogeneous mix of nodes that is the most energy-efficient while maintaining a given deadline. Considering service demands of the workloads on cores, memory and I/O devices, we derive Pareto-optimal configurations by matching the execution rate of different nodes. Our mix and match approach determines heterogeneous configurations that exhibit a \"sweet region\", where energy usage reduces linearly as the deadline is relaxed. Our analysis shows that mixing high-performance and low-power nodes is more energy-efficient than homogeneous datacenter clusters.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126479548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Software-Managed Power Reduction in Infiniband Links 软件管理的ib链路功耗降低
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.40
Branimir Dickov, M. Pericàs, P. Carpenter, N. Navarro, E. Ayguadé
{"title":"Software-Managed Power Reduction in Infiniband Links","authors":"Branimir Dickov, M. Pericàs, P. Carpenter, N. Navarro, E. Ayguadé","doi":"10.1109/ICPP.2014.40","DOIUrl":"https://doi.org/10.1109/ICPP.2014.40","url":null,"abstract":"The backbone of a large-scale supercomputer is the interconnection network. As compute nodes become more energy-efficient, the interconnect is accounting for an increasing proportion of the total system energy consumption. The interconnect's energy consumption is, however, only starting to receive serious attention. Some hardware-based schemes have been proposed that exploit idle periods or low utilisation, either by turning off the links or by lowering the frequency and voltage. Although these schemes are effective in certain cases, they do not have enough global information about the application's communication behaviour to efficiently manage the network power consumption. This paper proposes an alternative approach: moving the intelligence into the PMPI layer of the MPI library, and using prediction to discover repetitive patterns in the application's communication behaviour. The core of the prediction algorithm is an n-gram extraction technique, which can accurately predict not only when a link will become unused but also when it will become active again, allowing lanes to be switched off during the idle periods and switched back on again in time to avoid incurring a significant performance degradation. Many HPC applications benefit from prediction, since they have repetitive computation and communication phases. By implementing the energy-saving mechanism inside the MPI library, existing MPI programs do not need to be modified. Using an event-driven simulator, driven by representative HPC workloads, we demonstrate average energy savings in Infiniband switches up to around 33%, while the average execution time increase is only up to 1%.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132789907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling 动态事务调度中衡量有效工作以奖励成功
2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI: 10.1109/ICPP.2014.23
M. Pereira, J. N. Amaral, G. Araújo
{"title":"Measuring Effective Work to Reward Success in Dynamic Transaction Scheduling","authors":"M. Pereira, J. N. Amaral, G. Araújo","doi":"10.1109/ICPP.2014.23","DOIUrl":"https://doi.org/10.1109/ICPP.2014.23","url":null,"abstract":"One of the greatest challenges of modern computing is the development of software optimized for parallel execution in multi-core processors. Transactional Memory (TM) is a new trend in concurrency control that has emerged to address these challenges. TM promises the performance of finer grain locks combined with lower programming complexity. However, transactional memories are speculative and rely on contention managers to resolve conflicts between transactions. This paper explores a complementary approach to boost the performance of TM through the use of schedulers. A TM scheduler is a software component that decides when a particular transaction should be executed. TM scheduling mechanisms are typically restricted to either serialization or yielding. Moreover, their effectiveness is very sensitive to the accuracy of the metric used to predict transaction behavior, particularly in high-contention scenarios. This paper proposes a new Dynamic Transaction Scheduler (DTS) to select a transaction to execute next, based on a new policy that rewards success and uses an improved metric that measures the amount of effective work performed by a transaction. An experimental evaluation indicates that scheduling transactions based on DTS can provide good average-case performance.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"45 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114057577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信