2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献_第5页

Towards the Design of Systolic Genetic Search 心脏收缩基因搜索设计的探讨

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.220

M. Pedemonte, E. Alba, F. Luna

引用次数: 17

Energy Efficiency Analysis of GPUs gpu的能效分析

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.124

J. M. Cebrian, Ginés D. Guerrero, José M. García

{"title":"Energy Efficiency Analysis of GPUs","authors":"J. M. Cebrian, Ginés D. Guerrero, José M. García","doi":"10.1109/IPDPSW.2012.124","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.124","url":null,"abstract":"In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance per watt is really high. For massively parallel applications or kernels, using the available silicon resources for power management was unproductive, as the main objective of the unit was to execute the kernel as fast as possible. However, not all the applications that are being currently ported to GPUs can make use of all the available resources, either due to data dependencies, bandwidth requirements, legacy software on new hardware, etc, reducing the performance per watt. This new scenario requires new designs and optimizations to make these GPGPU's more energy efficient. But first comes first, we should begin by analyzing the applications we are running on these processors looking for bottlenecks and opportunities to optimize for energy efficiency. In this paper we analyze some kernels taken from the CUDA SDK2 in order to discover resource underutilization. Results show that this underutilization is present, and resource optimization can increase the energy efficiency of GPU-based computation. We then discuss different strategies and proposals to increase energy efficiency in future GPU designs.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Modeling for Synthesis with System# 基于System的综合建模

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.61

C. Köllner, Francisco Mendoza, K. Müller-Glaser

引用次数: 1

Task Scheduling in Large-scale Distributed Systems Utilizing Partial Reconfigurable Processing Elements 利用部分可重构处理元素的大规模分布式系统任务调度

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.6

F. Nadeem, I. Ashraf, S. A. Ostadzadeh, Stephan Wong, K. Bertels

{"title":"Task Scheduling in Large-scale Distributed Systems Utilizing Partial Reconfigurable Processing Elements","authors":"F. Nadeem, I. Ashraf, S. A. Ostadzadeh, Stephan Wong, K. Bertels","doi":"10.1109/IPDPSW.2012.6","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.6","url":null,"abstract":"Recent progress in processing speeds, network bandwidths, and middleware technologies have contributed towards novel computing platforms, ranging from large-scale computing clusters to globally distributed systems. Consequently, most current computing systems possess different types of heterogeneous processing resources. Entering into the peta-scale computing era and beyond, reconfigurable processing elements such as Field Programmable Gate Arrays (FPGAs), as well as forthcoming integrated hybrid computing cores, will play a leading role in the design of future distributed systems. Therefore, it is important to develop simulation tools to measure the performance of reconfigurable processors in the current and future distributed systems. In this paper, we propose the design of a simulation framework to investigate the performance of reconfigurable processors in distributed systems. The framework incorporates the partial reconfigurable functionality to the reconfigurable nodes. Depending on the available reconfigurable area, each node is able to execute more than one task simultaneously. Furthermore as a case study, we present a simple task scheduling algorithm to verify the functionality of the simulation framework. The proposed algorithm supports the scheduling of tasks on partially reconfigurable nodes. The simulation results are based on various experiments and they provide a comparison between full (one node-one task mapping) and partial (one node-multiple tasks mapping) configuration of the nodes, for the same set of parameters in each simulation run. Results suggest that the average wasted area per task is less as compared to the full configuration, verifying the functionality of the simulation framework.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132477841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Reducing Migration-induced Cache Misses 减少迁移导致的缓存丢失

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.215

Sajjid Reza, G. Byrd

引用次数: 4

An Efficient Property-Based Attestation Scheme with Flexible Revocation Mechanisms 具有灵活撤销机制的高效基于属性的认证方案

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.150

Yue Xiao-han, Zhou Fucai

引用次数: 1

MapReduce across Distributed Clusters for Data-intensive Applications 面向数据密集型应用的跨分布式集群MapReduce

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.249

Lizhe Wang, J. Tao, H. Marten, A. Streit, S. Khan, J. Kolodziej, Dan Chen

{"title":"MapReduce across Distributed Clusters for Data-intensive Applications","authors":"Lizhe Wang, J. Tao, H. Marten, A. Streit, S. Khan, J. Kolodziej, Dan Chen","doi":"10.1109/IPDPSW.2012.249","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.249","url":null,"abstract":"Recently, the computational requirements for large scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data are processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of GHadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters. G-Hadoop uses the Gfarm file system as an underlying file system and executes MapReduce tasks across distributed clusters. Experiments of the G-Hadoop framework on distributed clusters show encouraging results.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126391165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 40

A Unified Study of Epidemic Routing Protocols and their Enhancements 流行病路由协议及其增强的统一研究

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.187

Zhenxin Feng, Kwan-Wu Chin

{"title":"A Unified Study of Epidemic Routing Protocols and their Enhancements","authors":"Zhenxin Feng, Kwan-Wu Chin","doi":"10.1109/IPDPSW.2012.187","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.187","url":null,"abstract":"Epidemic protocols belong to a class of routing paradigm that have wide ranging applications in Delay Tolerant Networks (DTNs) due to their simplicity, low delays, and little to no reliance on special nodes. To this end, a comprehensive study of their performance will serve as an important guide to future protocol designers. Unfortunately, to date, there is no work that studies epidemic routing protocols using a common framework that evaluates their performance objectively using the same mobility model and parameters. To this end, we study four categories of epidemic routing protocols. Namely, P-Q epidemic, epidemic with Time-To-Live (TTL), epidemic with Encounter Count (EC) and epidemic with immunity table. Our results show that the probability of transmissions as used in P-Q epidemic may increase delay and decrease delivery ratio. Apart from that, an incorrect TTL value leads to premature discarding of bundles, and thereby, has a non negligible impact on delivery ratio. Epidemic with EC suffers from high buffer occupancy levels and long delivery delays. In addition, epidemic with immunity suffers from high overheads. Henceforth, we propose three enhancements: dynamic TTL, EC+TTL and cumulative immunity to address the aforementioned limitations. Our results show that dynamic TTL improves delivery ratio by more than 20%, EC+TTL reduces buffer occupancy level by 40%, and improve delivery ratio by at least 40% at high loads. Cumulative immunity reduces the buffer occupancy level of nodes by at least 15% whilst in curing an order of magnitude less signaling overheads.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 27

Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study 使用编译器辅助专门化改进高性能稀疏库:PETSc案例研究

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.63

S. Ramalingam, Mary W. Hall, Chun Chen

{"title":"Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study","authors":"S. Ramalingam, Mary W. Hall, Chun Chen","doi":"10.1109/IPDPSW.2012.63","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.63","url":null,"abstract":"Scientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive understanding required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the single processor sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scalable scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. We use CHiLL (Compos able High-Level Loop Transformation Framework) to apply source level transformations tailored to the special needs of sparse computations and automatically generate highly optimized PETSc functions. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Compile-Time Detection of False Sharing via Loop Cost Modeling 基于循环代价模型的虚假共享的编译时检测

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI: 10.1109/IPDPSW.2012.67

M. Tolubaeva, Yonghong Yan, B. Chapman

引用次数: 3