2011 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Investigating Scenario-Conscious Asynchronous Rendezvous over RDMA 在RDMA上研究场景感知异步会合
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.65
Judicael A. Zounmevo, A. Afsahi
{"title":"Investigating Scenario-Conscious Asynchronous Rendezvous over RDMA","authors":"Judicael A. Zounmevo, A. Afsahi","doi":"10.1109/CLUSTER.2011.65","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.65","url":null,"abstract":"In this paper, we propose a light-weight asynchronous message progression mechanism for large message transfers in Message Passing Interface (MPI) Rendezvous protocol that is scenario-conscious and consequently overhead-free in cases where independent message progression naturally happens. Without requiring a dedicated thread, we take advantage of small bursts of CPU to poll for message transfer conditions. The existing application thread is parasitized for the purpose of getting those small bursts of CPU. Our proposed approach is only triggered when the message transfer would otherwise be deferred to the MPI wait call, and it allows for full message progression, achieving 100% overlap. It does not add to the memory footprint of the applications, and is effective in improving the communication performance of most of the applications studied in this paper.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126588883","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization EDO:通过弹性数据组织提高科学应用的读取性能
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.18
Yuan Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu
{"title":"EDO: Improving Read Performance for Scientific Applications through Elastic Data Organization","authors":"Yuan Tian, S. Klasky, H. Abbasi, J. Lofstead, R. Grout, N. Podhorszki, Qing Liu, Yandong Wang, Weikuan Yu","doi":"10.1109/CLUSTER.2011.18","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.18","url":null,"abstract":"Large scale scientific applications are often bottlenecked due to the writing of checkpoint-restart data. Much work has been focused on improving their write performance. With the mounting needs of scientific discovery from these datasets, it is also important to provide good read performance for many common access patterns, which requires effective data organization. To address this issue, we introduce Elastic Data Organization (EDO), which can transparently enable different data organization strategies for scientific applications. Through its flexible data ordering algorithms, EDO harmonizes different access patterns with the underlying file system. Two levels of data ordering are introduced in EDO. One works at the level of data groups (a.k.a process groups). It uses Hilbert Space Filling Curves (SFC) to balance the distribution of data groups across storage targets. Another governs the ordering of data elements within a data group. It divides a data group into sub chunks and strikes a good balance between the size of sub chunks and the number of seek operations. Our experimental results demonstrate that EDO is able to achieve balanced data distribution across all dimensions and improve the read performance of multidimensional datasets in scientific applications.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116640214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
Automatic Hybrid OpenMP + MPI Program Generation for Dynamic Programming Problems 动态规划问题的自动混合OpenMP + MPI程序生成
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.28
Dennis Vandenberg, Q. Stout
{"title":"Automatic Hybrid OpenMP + MPI Program Generation for Dynamic Programming Problems","authors":"Dennis Vandenberg, Q. Stout","doi":"10.1109/CLUSTER.2011.28","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.28","url":null,"abstract":"We describe a program that automatically generates a hybrid OpenMP + MPI program for a class of recursive calculations with template dependencies. Many useful generalized dynamic programming problems fit this category, such as Multiple String Alignment and multi-arm Bernoulli Bandit problems. Solving problems like these, especially those involving several dimensions, can use a significant amount of memory and time. Our generator addresses these issues by dividing the problem into many tiles that can be solved in parallel. Programs generated using this program generator are capable of solving large problems and achieve good scalability when run on many cores. The input supplied to the generator is a high level description of the problem, the output is a fully functioning parallel program for a cluster of shared memory nodes. This high level approach to parallel computation allows the generator to have a large amount of control over memory allocation, load balancing and calculation ordering.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment AA-Dedupe:面向个人计算环境下云备份业务的基于应用感知的源重复数据删除方法
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.20
Yinjin Fu, Hong Jiang, Nong Xiao, Lei Tian, F. Liu
{"title":"AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment","authors":"Yinjin Fu, Hong Jiang, Nong Xiao, Lei Tian, F. Liu","doi":"10.1109/CLUSTER.2011.20","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.20","url":null,"abstract":"The market for cloud backup services in the personal computing environment is growing due to large volumes of valuable personal and corporate data being stored on desktops, laptops and smart phones. Source deduplication has become a mainstay of cloud backup that saves network bandwidth and reduces storage space. However, there are two challenges facing deduplication for cloud backup service clients: (1) low deduplication efficiency due to a combination of the resource-intensive nature of deduplication and the limited system resources on the PC-based client site, and (2) low data transfer efficiency since post-deduplication data transfers from source to backup servers are typically very small but must often cross a WAN. In this paper, we present AA-Dedupe, an application-aware source deduplication scheme, to significantly reduce the computational overhead, increase the deduplication throughput and improve the data transfer efficiency. The AA-Dedupe approach is motivated by our key observations of the substantial differences among applications in data redundancy and deduplication characteristics, and thus is based on an application-aware index structure that effectively exploits this application awareness. Our experimental evaluations, based on an AA-Dedupe prototype implementation, show that our scheme can improve deduplication efficiency over the state-of-art source-deduplication methods by a factor of 2-7, resulting in shortened backup window, increased power-efficiency and reduced cost for cloud backup services.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126978214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
Design of HPC Node with Heterogeneous Processors 异构处理器HPC节点的设计
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.23
Zheng Cao, Hongwei Tang, Qiang Li, Bo-Zhang Li, Fei Chen, Kai Wang, Xuejun An, Ninghui Sun
{"title":"Design of HPC Node with Heterogeneous Processors","authors":"Zheng Cao, Hongwei Tang, Qiang Li, Bo-Zhang Li, Fei Chen, Kai Wang, Xuejun An, Ninghui Sun","doi":"10.1109/CLUSTER.2011.23","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.23","url":null,"abstract":"Heterogeneous Computing is becoming an important technology trend in HPC, where more and more heterogeneous processors are used. However, in traditional node architecture, heterogeneous processors are always used as coprocessors. Such usage increases the communication latency between heterogeneous processors and prevents the node from achieving high density. With the purpose of improving communication efficiency between heterogeneous processors, this paper proposed a new node architecture named HeteNode. In HeteNode, general purpose processors and heterogeneous processors are interconnected by a system controller directly and play the same role in both process of communication and process of computation. The prototype of HeteNode which contains nine processors in 1U chassis is built. Evaluation carried out on the prototype shows that 580ns minimum intra-node latency and 1.78us minimum inter-node latency between heterogeneous processors are achieved. Besides, NPB benchmarks show good scalability in HeteNode.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116526485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit GPGPU集群上的MPI全个性化交换:设计方案和收益
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.67
A. Singh, S. Potluri, Hao Wang, K. Kandalla, S. Sur, D. Panda
{"title":"MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit","authors":"A. Singh, S. Potluri, Hao Wang, K. Kandalla, S. Sur, D. Panda","doi":"10.1109/CLUSTER.2011.67","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.67","url":null,"abstract":"General Purpose Graphics Processing Units (GPGPUs) are rapidly becoming an integral part of high performance system architectures. The Tianhe-1A and Tsubame systems received significant attention for their architectures that leverage GPGPUs. Increasingly many scientific applications that were originally written for CPUs using MPI for parallelism are being ported to these hybrid CPU-GPU clusters. In the traditional sense, CPUs perform computation while the MPI library takes care of communication. When computation is performed on GPGPUs, the data has to be moved from device memory to main memory before it can be used in communication. Though GPGPUs provide huge compute potential, the data movement to and from GPGPUs is both a performance and productivity bottleneck. Recently, the MVAPICH2 MPI library has been modified to directly support point-to-point MPI communication from the GPU memory [1]. Using this support, programmers do not need to explicitly move data to main memory before using MPI. This feature also enables performance improvement due to tight integration of GPU data movement and MPI internal protocols. Typically, scientific applications spend a significant portion of their execution time in collective communication. Hence, optimizing performance of collectives has a significant impact on their performance. MPI_Alltoall is a heavily used collective that has O(N2) communication, for N processes. In this paper, we outline the major design alternatives for MPI_Alltoall collective communication operation on GPGPU clusters. We propose three design alternatives and provide a corresponding performance analysis. Using our dynamic staging techniques, the latency of MPI_Alltoall on GPU clusters can be improved by 44% over a user level implementation and 31% over a send-recv based implementation for 256 KByte messages on 8 processes.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115163344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Large-Scale Simulator for Global Data Infrastructure Optimization 面向全局数据基础设施优化的大规模模拟器
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.15
Sergio Herrero-Lopez, John R. Williams, Abel Sanchez
{"title":"Large-Scale Simulator for Global Data Infrastructure Optimization","authors":"Sergio Herrero-Lopez, John R. Williams, Abel Sanchez","doi":"10.1109/CLUSTER.2011.15","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.15","url":null,"abstract":"IT infrastructures in global corporations are appropriately compared with nervous systems, in which body parts (interconnected datacenters) exchange signals (request responses) in order to coordinate actions (data visualization and manipulation). A priori inoffensive perturbations in the operation of the system or the elements composing the infrastructure can lead to catastrophic consequences. Downtime disables the capability of clients reaching the latest versions of the data and/or propagating their individual contributions to other clients, potentially costing millions of dollars to the organization affected. The imperative need of guaranteeing the proper functioning of the system not only forces to pay particular attention to network outages, hot-objects or application defects, but also slows down the deployment of new capabilities, features and equipment upgrades. Under these circumstances, decision cycles for these modifications can be extremely conservative, and be prolonged for years, involving multiple authorities across departments of the organization. Frequently, the solutions adopted are years behind state-of-the art technologies or phased out compared to leading research on the IT infrastructure field. In this paper, the utilization of a large-scale data infrastructure simulator is proposed, in order to evaluate the impact of \" what if\" scenarios on the performance, availability and reliability of the system. The goal is to provide data center operators a tool that allows understanding and predicting the consequences of the deployment of new network topologies, hardware configurations or software applications in a global data infrastructure, without affecting the service. The simulator was constructed using a multi-layered approach, providing a granularity down to the individual server component and client action, and was validated against a downscaled version of the data infrastructure of a Fortune 500 company.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133935657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Achieving Scalable Parallelization for the Hessenberg Factorization 实现海森伯格分解的可伸缩并行化
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.16
A. Castaldo, R. Clint Whaley
{"title":"Achieving Scalable Parallelization for the Hessenberg Factorization","authors":"A. Castaldo, R. Clint Whaley","doi":"10.1109/CLUSTER.2011.16","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.16","url":null,"abstract":"Much of dense linear algebra has been successfully blocked to concentrate the majority of its time in the Level~3 BLAS, which are not only efficient for serial computation, but also scale well for parallelism. For the Hessenberg factorization, which is a critical step in computing the eigenvalues and vectors, however, performance of the best known algorithm is still strongly limited by the memory speed, which does not tend to scale well at all. In this paper we present an adaptation of our Parallel Cache Assignment (PCA) technique to the Hessenberg factorization, and show that it achieves super linear speedup over the corresponding serial algorithm and a more than four-fold speedup over the best known algorithm for small and medium sized problems.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134472299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Scheduling Workflows in Opportunistic Environments 在机会环境中调度工作流
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.72
María M. López, E. Heymann, M. A. Senar
{"title":"Scheduling Workflows in Opportunistic Environments","authors":"María M. López, E. Heymann, M. A. Senar","doi":"10.1109/CLUSTER.2011.72","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.72","url":null,"abstract":"Workflow applications exhibit both high computation times and data transfer rates. For this reason, the completion time of the workflow is high. To reduce completion time, the tasks of a workflow ought to run on different machines interconnected by a network. Correct assignment of tasks to machines within the runtime environment is an important aspect in the completion time or make span. The manager making the assignment is the scheduler. The main problem of a static scheduler is that it ignores the changes that occur in the execution environment during DAG execution. To solve this problem, we developed a new dynamic scheduler. This dynamic scheduler monitors the behavior of the tasks executed as well as the execution environment, and it reacts to the changes detected by adapting the scheduling of the rest of pending tasks. The objective is to reduce the overhead incurred by excessive self-adaptations, without affecting the make span. To reduce overhead, the algorithm self-adapts only when an improvement in make span is expected. The proposed policies have been simulated and then executed in a real environment. These executions have achieved a reduction of the overhead of greater than 20%.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121426112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort 基于分布基数排序的异构聚类的四分位数和离群值检测
2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI: 10.1109/CLUSTER.2011.53
Kyle Spafford, J. Meredith, J. Vetter
{"title":"Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort","authors":"Kyle Spafford, J. Meredith, J. Vetter","doi":"10.1109/CLUSTER.2011.53","DOIUrl":"https://doi.org/10.1109/CLUSTER.2011.53","url":null,"abstract":"In the past few years, performance improvements in CPUs and memory technologies have outpaced those of storage systems. When extrapolated to the exascale, this trend places strict limits on the amount of data that can be written to disk for full analysis, resulting in an increased reliance on characterizing in-memory data. Many of these characterizations are simple, but require sorted data. This paper explores an example of this type of characterization -- the identification of quartiles and statistical outliers -- and presents a performance analysis of a distributed heterogeneous radix sort as well as an assessment of current architectural bottlenecks.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129679950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信