2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)最新文献_第3页

APENet: a high speed, low latency 3D interconnect network APENet:一个高速、低延迟的3D互连网络

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392647

R. Ammendola, M. Guagnelli, G. Mazza, F. Palombi, R. Petronzio, D. Rossetti, A. Salamon, P. Vicini

引用次数: 5

Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms 使用组播和自适应算法的Infiniband集群上的高效屏障和Allreduce

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392611

A. Mamidala, Jiuxing Liu, D. Panda

{"title":"Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms","authors":"A. Mamidala, Jiuxing Liu, D. Panda","doi":"10.1109/CLUSTR.2004.1392611","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392611","url":null,"abstract":"Popular algorithms proposed in the literature for doing Barrier and Allreduce in clusters, such as pair-wise exchange, dissemination and gather-broadcast do not give an optimal performance when there is skew among the nodes in the cluster. In pair-wise exchange and dissemination, all the nodes must arrive for the completion of each step. The gather-broadcast algorithm assumes a fixed tree topology. We propose to use hardware multicast of InfiniBand in the design of an adaptive algorithm that performs well in the presence of skew. In this approach, the topology of the tree is not fixed but adapts depending on the skew. The last arriving node becomes the root of the tree if the skew is sufficiently large. We have carried out in-depth evaluation of our scheme and use synchronization delay as the performance metric for Barrier and Allreduce in the presence of skew. Our performance evaluation shows that our design scales very well with system size. Our designs can reduce the synchronization delay by a factor of 2.28 for Barrier and by a factor of 2.18 in the case of Allreduce. We have examined different skew scenarios and showed that the adaptive design performs either better or comparably to the existing schemes.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131826895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Fault-tolerant grid services using primary-backup: feasibility and performance 使用主备份的容错网格服务:可行性和性能

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392608

Xianan Zhang, D. Zagorodnov, M. Hiltunen, K. Marzullo, R. Schlichting

{"title":"Fault-tolerant grid services using primary-backup: feasibility and performance","authors":"Xianan Zhang, D. Zagorodnov, M. Hiltunen, K. Marzullo, R. Schlichting","doi":"10.1109/CLUSTR.2004.1392608","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392608","url":null,"abstract":"The combination of grid technology and Web services has produced an attractive platform for deploying distributed applications: grid services, as represented by the Open Grid Services Infrastructure (OGSI) and its Globus toolkit implementation. As the use of grid services grows in popularity, tolerating failures becomes increasingly important. This work addresses the problem of building a reliable and highly-available grid service by replicating the service on two or more hosts using the primary-backup approach. The primary goal is to evaluate the ease and efficiency with which this can be done, by first designing a primary-backup protocol using OGSI, and then implementing it using Globus to evaluate performance implications and tradeoffs. We compared three implementations: one that makes heavy use of the notification interface defined in OGSI, one that uses standard grid service requests instead of notification, and one that uses low-level socket primitives. The overall conclusion is that, while the performance penalty of using Globus primitives - especially notification - for replica coordination can be significant, the OGSI model is suitable for building highly-available services and it makes the task of engineering such services easier.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130831338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 73

On optimizing collective communication 论优化集体沟通

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392612

E. Chan, M. Heimlich, A. Purkayastha, R. V. D. Geijn

引用次数: 58

An efficient end-host architecture for cluster communication 一种用于集群通信的高效终端主机架构

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392605

Xin Qi, Gabriel Parmer, R. West

{"title":"An efficient end-host architecture for cluster communication","authors":"Xin Qi, Gabriel Parmer, R. West","doi":"10.1109/CLUSTR.2004.1392605","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392605","url":null,"abstract":"Cluster computing environments built from commodity hardware have provided a cost-effective solution for many scientific and high-performance applications. Likewise, middleware techniques have provided the basis for large-scale applications to communicate and exchange data across the various end-hosts in a distributed system. Unfortunately, middleware services are typically encapsulated in user-level address spaces that suffer from scheduling delays and communication overheads induced by the host kernel. For various high performance distributed computing applications such overheads are unacceptable. This work therefore addresses the problem of providing an efficient end-host architecture to support application-specific communication services at user-level, without the need to explicitly schedule such services or copy data via the kernel. We briefly describe a sandboxing mechanism that allows applications to configure and deploy services at user-level that may execute in the context of any address space. Using Linux as the basis for our approach, we focus specifically on the implementation of a user-space network protocol stack that avoids copying data via the kernel when communicating with the network interface. Our approach enables services to efficiently process and forward data via proxies, or intermediate hosts, in the communication path of high performance data streams. Unlike other user-level networking implementations, our method makes no special hardware requirements. Results show that we achieve a substantial increase in throughput, and a reduction in jitter, over comparable user-space communication methods.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133441209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

On fairness in distributed job scheduling across multiple sites 多站点分布式作业调度的公平性研究

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392599

Gerald Sabin, Vishvesh Sahasrabudhe, P. Sadayappan

引用次数: 15

JuxtaView - a tool for interactive visualization of large imagery on scalable tiled displays 用于在可伸缩的平铺显示器上显示大型图像的交互式可视化工具

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392640

N. K. Krishnaprasad, V. Vishwanath, S. Venkataraman, A. G. Rao, L. Renambot, J. Leigh, Andrew E. Johnson, B. Davis

引用次数: 56

RAAC: an architecture for scalable, reliable storage in clusters RAAC:用于集群中可伸缩、可靠存储的体系结构

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392613

Manoj Pillai, Mario Lauria

引用次数: 2

Grid systems deployment & management using Rocks 使用Rocks进行网格系统部署和管理

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392631

Federico D. Sacerdoti, Sandeep Chandra, K. Bhatia

{"title":"Grid systems deployment & management using Rocks","authors":"Federico D. Sacerdoti, Sandeep Chandra, K. Bhatia","doi":"10.1109/CLUSTR.2004.1392631","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392631","url":null,"abstract":"Wide-area grid deployments are becoming a standard for shared cyberinfrastructure within scientific domain communities. These systems enable resource sharing, data management and publication, collaboration, and shared development of community resources. This work describes the systems management solution developed for one such grid deployment, the GEON Grid (GEOsciences Network), a domain-specific grid of clusters for geological research. GEON provides a standardized base software stack across all sites to ensure interoperability while providing structures that allow local customization. This situation gives rise to a set of requirements that are difficult to satisfy with existing tools. Cluster management software is available that allows administrators to specify and install a common software stack on all nodes of a single cluster and enable centralized control and diagnostics of its components with minimal effort. While grid deployments have similar management requirements to computational clusters, they have faced a lack of available tools to address their needs. We describe extensions to the Rocks cluster distribution to satisfy several key goals of the GEON Grid, and show how these wide-area cluster integration extensions satisfy the most important of these goals.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124468435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 35

GNET-1: gigabit Ethernet network testbed GNET-1:千兆以太网测试平台

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392616

Yuetsu Kodama, T. Kudoh, Ryousei Takano, Hitoshi Sato, Osamu Tatebe, Satoshi Sekiguchi

引用次数: 53