2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum最新文献

筛选
英文 中文
Towards the Design of Systolic Genetic Search 心脏收缩基因搜索设计的探讨
M. Pedemonte, E. Alba, F. Luna
{"title":"Towards the Design of Systolic Genetic Search","authors":"M. Pedemonte, E. Alba, F. Luna","doi":"10.1109/IPDPSW.2012.220","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.220","url":null,"abstract":"This paper elaborates on a new, fresh parallel optimization algorithm specially engineered to run on Graphic Processing Units (GPUs). The underlying operation relates to Systolic Computation. The algorithm, called Systolic Genetic Search (SGS) is based on the synchronous circulation of solutions through a grid of processing units and tries to profit from the parallel architecture of GPUs. The proposed model has shown to outperform a random search and two genetic algorithms for solving the Knapsack Problem over a set of increasingly sized instances. Additionally, the parallel implementation of SGS on a GeForce GTX 480 graphics processing unit (GPU), obtaining a runtime reduction up to 35 times.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127451237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Energy Efficiency Analysis of GPUs gpu的能效分析
J. M. Cebrian, Ginés D. Guerrero, José M. García
{"title":"Energy Efficiency Analysis of GPUs","authors":"J. M. Cebrian, Ginés D. Guerrero, José M. García","doi":"10.1109/IPDPSW.2012.124","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.124","url":null,"abstract":"In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance per watt is really high. For massively parallel applications or kernels, using the available silicon resources for power management was unproductive, as the main objective of the unit was to execute the kernel as fast as possible. However, not all the applications that are being currently ported to GPUs can make use of all the available resources, either due to data dependencies, bandwidth requirements, legacy software on new hardware, etc, reducing the performance per watt. This new scenario requires new designs and optimizations to make these GPGPU's more energy efficient. But first comes first, we should begin by analyzing the applications we are running on these processors looking for bottlenecks and opportunities to optimize for energy efficiency. In this paper we analyze some kernels taken from the CUDA SDK2 in order to discover resource underutilization. Results show that this underutilization is present, and resource optimization can increase the energy efficiency of GPU-based computation. We then discuss different strategies and proposals to increase energy efficiency in future GPU designs.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123739511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Modeling for Synthesis with System# 基于System的综合建模
C. Köllner, Francisco Mendoza, K. Müller-Glaser
{"title":"Modeling for Synthesis with System#","authors":"C. Köllner, Francisco Mendoza, K. Müller-Glaser","doi":"10.1109/IPDPSW.2012.61","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.61","url":null,"abstract":"While Electronic Design Automation made the shift towards system design and high-level design methods keep on emerging, there is hardly any open framework which allows researchers to quickly prototype novel synthesis algorithms. We present System#, an open source system level design framework based on C#. System# tries to bridge the productivity gap by covering modeling, simulation, code transformations and VHDL code generation in a single extensible platform. We explain how common modeling principles, such as component-based design, the separation of communication and computation, concurrent behavior and time are realized in System#. The implementation of an appropriate simulator kernel is discussed. We demonstrate the potential of code transformations by giving application examples: converting a cycle-accurate sequential specification to an explicit synthesizable finite state machine representation and IP-based design. We conclude that System# is an appropriate research and integration platform which has the potential to add value to the research community.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126947213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Task Scheduling in Large-scale Distributed Systems Utilizing Partial Reconfigurable Processing Elements 利用部分可重构处理元素的大规模分布式系统任务调度
F. Nadeem, I. Ashraf, S. A. Ostadzadeh, Stephan Wong, K. Bertels
{"title":"Task Scheduling in Large-scale Distributed Systems Utilizing Partial Reconfigurable Processing Elements","authors":"F. Nadeem, I. Ashraf, S. A. Ostadzadeh, Stephan Wong, K. Bertels","doi":"10.1109/IPDPSW.2012.6","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.6","url":null,"abstract":"Recent progress in processing speeds, network bandwidths, and middleware technologies have contributed towards novel computing platforms, ranging from large-scale computing clusters to globally distributed systems. Consequently, most current computing systems possess different types of heterogeneous processing resources. Entering into the peta-scale computing era and beyond, reconfigurable processing elements such as Field Programmable Gate Arrays (FPGAs), as well as forthcoming integrated hybrid computing cores, will play a leading role in the design of future distributed systems. Therefore, it is important to develop simulation tools to measure the performance of reconfigurable processors in the current and future distributed systems. In this paper, we propose the design of a simulation framework to investigate the performance of reconfigurable processors in distributed systems. The framework incorporates the partial reconfigurable functionality to the reconfigurable nodes. Depending on the available reconfigurable area, each node is able to execute more than one task simultaneously. Furthermore as a case study, we present a simple task scheduling algorithm to verify the functionality of the simulation framework. The proposed algorithm supports the scheduling of tasks on partially reconfigurable nodes. The simulation results are based on various experiments and they provide a comparison between full (one node-one task mapping) and partial (one node-multiple tasks mapping) configuration of the nodes, for the same set of parameters in each simulation run. Results suggest that the average wasted area per task is less as compared to the full configuration, verifying the functionality of the simulation framework.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132477841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Reducing Migration-induced Cache Misses 减少迁移导致的缓存丢失
Sajjid Reza, G. Byrd
{"title":"Reducing Migration-induced Cache Misses","authors":"Sajjid Reza, G. Byrd","doi":"10.1109/IPDPSW.2012.215","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.215","url":null,"abstract":"In a large multiprocessor server platform, using multicore chips, the scheduler often migrates a scheduling entity, i.e. a thread or process or virtual machine, in order to achieve better load balancing or ensure fairness. The migration impact is likely to be more severe in virtualized environments, where high over-subscription of logical CPUs is very common for server consolidation workloads or virtual desktop infrastructure deployment. We demonstrate the performance benefit of saving and restoring cached data during migration. In particular, we measure the efficiency (benefit per cache block) of saving various subsets of the cached data, in order to balance implementation cost and complexity with improvements in cycle time. We also describe an implementation that moves cached data when a thread migrates, and we show the benefits in terms of reduced misses and reduced processor cycles.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130292471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Efficient Property-Based Attestation Scheme with Flexible Revocation Mechanisms 具有灵活撤销机制的高效基于属性的认证方案
Yue Xiao-han, Zhou Fucai
{"title":"An Efficient Property-Based Attestation Scheme with Flexible Revocation Mechanisms","authors":"Yue Xiao-han, Zhou Fucai","doi":"10.1109/IPDPSW.2012.150","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.150","url":null,"abstract":"In order to solve the problem of platform configuration information leakage that caused by the traditional platform authentication in the distributed trusted computing environment, this paper proposes a novel property-based attestation scheme. This scheme has flexible checking mechanisms of property certificate status, efficient computation and is provable security in the random oracle model. This paper designs the framework of the scheme, defines the security model of the scheme, gives concrete construction of the scheme, and proves the security of this scheme in the random oracle model and that this scheme satisfies the correctness, attestation unforgeability, configuration privacy and non-frame ability. Finally, proposed scheme is compared with the existing PBA schemes on the computation cost and communication cost respectively. The results show that our scheme is more practical and efficient.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126840807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MapReduce across Distributed Clusters for Data-intensive Applications 面向数据密集型应用的跨分布式集群MapReduce
Lizhe Wang, J. Tao, H. Marten, A. Streit, S. Khan, J. Kolodziej, Dan Chen
{"title":"MapReduce across Distributed Clusters for Data-intensive Applications","authors":"Lizhe Wang, J. Tao, H. Marten, A. Streit, S. Khan, J. Kolodziej, Dan Chen","doi":"10.1109/IPDPSW.2012.249","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.249","url":null,"abstract":"Recently, the computational requirements for large scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data are processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of GHadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters. G-Hadoop uses the Gfarm file system as an underlying file system and executes MapReduce tasks across distributed clusters. Experiments of the G-Hadoop framework on distributed clusters show encouraging results.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126391165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
A Unified Study of Epidemic Routing Protocols and their Enhancements 流行病路由协议及其增强的统一研究
Zhenxin Feng, Kwan-Wu Chin
{"title":"A Unified Study of Epidemic Routing Protocols and their Enhancements","authors":"Zhenxin Feng, Kwan-Wu Chin","doi":"10.1109/IPDPSW.2012.187","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.187","url":null,"abstract":"Epidemic protocols belong to a class of routing paradigm that have wide ranging applications in Delay Tolerant Networks (DTNs) due to their simplicity, low delays, and little to no reliance on special nodes. To this end, a comprehensive study of their performance will serve as an important guide to future protocol designers. Unfortunately, to date, there is no work that studies epidemic routing protocols using a common framework that evaluates their performance objectively using the same mobility model and parameters. To this end, we study four categories of epidemic routing protocols. Namely, P-Q epidemic, epidemic with Time-To-Live (TTL), epidemic with Encounter Count (EC) and epidemic with immunity table. Our results show that the probability of transmissions as used in P-Q epidemic may increase delay and decrease delivery ratio. Apart from that, an incorrect TTL value leads to premature discarding of bundles, and thereby, has a non negligible impact on delivery ratio. Epidemic with EC suffers from high buffer occupancy levels and long delivery delays. In addition, epidemic with immunity suffers from high overheads. Henceforth, we propose three enhancements: dynamic TTL, EC+TTL and cumulative immunity to address the aforementioned limitations. Our results show that dynamic TTL improves delivery ratio by more than 20%, EC+TTL reduces buffer occupancy level by 40%, and improve delivery ratio by at least 40% at high loads. Cumulative immunity reduces the buffer occupancy level of nodes by at least 15% whilst in curing an order of magnitude less signaling overheads.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study 使用编译器辅助专门化改进高性能稀疏库:PETSc案例研究
S. Ramalingam, Mary W. Hall, Chun Chen
{"title":"Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study","authors":"S. Ramalingam, Mary W. Hall, Chun Chen","doi":"10.1109/IPDPSW.2012.63","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.63","url":null,"abstract":"Scientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive understanding required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the single processor sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scalable scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. We use CHiLL (Compos able High-Level Loop Transformation Framework) to apply source level transformations tailored to the special needs of sparse computations and automatically generate highly optimized PETSc functions. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126514795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Compile-Time Detection of False Sharing via Loop Cost Modeling 基于循环代价模型的虚假共享的编译时检测
M. Tolubaeva, Yonghong Yan, B. Chapman
{"title":"Compile-Time Detection of False Sharing via Loop Cost Modeling","authors":"M. Tolubaeva, Yonghong Yan, B. Chapman","doi":"10.1109/IPDPSW.2012.67","DOIUrl":"https://doi.org/10.1109/IPDPSW.2012.67","url":null,"abstract":"False sharing, which occurs when multiple threads access different data elements on the same cache line, and at least one of them updates the data, is a well known source of performance degradation on cache coherent parallel systems. The application developer is often unaware of this problem during program creation, and it can be hard to detect instances of its occurrence in a large code. In this paper, we present a compile-time cost model for estimating the performance impact of false sharing on parallel loops. Using this model, we are able to predict the amount of false sharing that could occur when the loop is executed, and can indicate the percentage of program execution time that is due to maintaining the coherence of data from false sharing. We evaluated our model by comparing its predictions obtained on several computational kernels using 2 to 48 threads against that from actual execution. The results showed that our model can accurately quantify the impact of false sharing on loop performance at compile-time.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121313375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信