2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)最新文献

筛选
英文 中文
APENet: a high speed, low latency 3D interconnect network APENet:一个高速、低延迟的3D互连网络
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392647
R. Ammendola, M. Guagnelli, G. Mazza, F. Palombi, R. Petronzio, D. Rossetti, A. Salamon, P. Vicini
{"title":"APENet: a high speed, low latency 3D interconnect network","authors":"R. Ammendola, M. Guagnelli, G. Mazza, F. Palombi, R. Petronzio, D. Rossetti, A. Salamon, P. Vicini","doi":"10.1109/CLUSTR.2004.1392647","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392647","url":null,"abstract":"Summary form only given. We present APENet, a new high speed, low latency, 3-dimensional interconnect architecture optimized for PC clusters running LQCD-like numerical applications. The hardware implementation is based on a single PCI-X 133MHz network interface card hosting six independent bidirectional channels with a peak bandwidth of 676 MB/s each direction and measured latency less than 10 /spl mu/s. The internal packet switching capabilities of the network card allows up to three couple of links simultaneously active. The current software environment, based on Linux, is made of a low-level library and a high-level application library. An MPI implementation and a network device driver are being actively developed.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125079209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms 使用组播和自适应算法的Infiniband集群上的高效屏障和Allreduce
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392611
A. Mamidala, Jiuxing Liu, D. Panda
{"title":"Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms","authors":"A. Mamidala, Jiuxing Liu, D. Panda","doi":"10.1109/CLUSTR.2004.1392611","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392611","url":null,"abstract":"Popular algorithms proposed in the literature for doing Barrier and Allreduce in clusters, such as pair-wise exchange, dissemination and gather-broadcast do not give an optimal performance when there is skew among the nodes in the cluster. In pair-wise exchange and dissemination, all the nodes must arrive for the completion of each step. The gather-broadcast algorithm assumes a fixed tree topology. We propose to use hardware multicast of InfiniBand in the design of an adaptive algorithm that performs well in the presence of skew. In this approach, the topology of the tree is not fixed but adapts depending on the skew. The last arriving node becomes the root of the tree if the skew is sufficiently large. We have carried out in-depth evaluation of our scheme and use synchronization delay as the performance metric for Barrier and Allreduce in the presence of skew. Our performance evaluation shows that our design scales very well with system size. Our designs can reduce the synchronization delay by a factor of 2.28 for Barrier and by a factor of 2.18 in the case of Allreduce. We have examined different skew scenarios and showed that the adaptive design performs either better or comparably to the existing schemes.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131826895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Fault-tolerant grid services using primary-backup: feasibility and performance 使用主备份的容错网格服务:可行性和性能
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392608
Xianan Zhang, D. Zagorodnov, M. Hiltunen, K. Marzullo, R. Schlichting
{"title":"Fault-tolerant grid services using primary-backup: feasibility and performance","authors":"Xianan Zhang, D. Zagorodnov, M. Hiltunen, K. Marzullo, R. Schlichting","doi":"10.1109/CLUSTR.2004.1392608","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392608","url":null,"abstract":"The combination of grid technology and Web services has produced an attractive platform for deploying distributed applications: grid services, as represented by the Open Grid Services Infrastructure (OGSI) and its Globus toolkit implementation. As the use of grid services grows in popularity, tolerating failures becomes increasingly important. This work addresses the problem of building a reliable and highly-available grid service by replicating the service on two or more hosts using the primary-backup approach. The primary goal is to evaluate the ease and efficiency with which this can be done, by first designing a primary-backup protocol using OGSI, and then implementing it using Globus to evaluate performance implications and tradeoffs. We compared three implementations: one that makes heavy use of the notification interface defined in OGSI, one that uses standard grid service requests instead of notification, and one that uses low-level socket primitives. The overall conclusion is that, while the performance penalty of using Globus primitives - especially notification - for replica coordination can be significant, the OGSI model is suitable for building highly-available services and it makes the task of engineering such services easier.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130831338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 73
On optimizing collective communication 论优化集体沟通
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392612
E. Chan, M. Heimlich, A. Purkayastha, R. V. D. Geijn
{"title":"On optimizing collective communication","authors":"E. Chan, M. Heimlich, A. Purkayastha, R. V. D. Geijn","doi":"10.1109/CLUSTR.2004.1392612","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392612","url":null,"abstract":"We discuss issues related to the high-performance implementation of collective communications operations on distributed-memory computer architectures. Using a combination of known techniques (many of which were first proposed in the 1980s and early 1990s) along with careful exploitation of communication modes supported by MPI, we have developed implementations that have improved performance in most situations compared to those currently supported by public domain implementations of MPI such as MPICH. Performance results from a large Intel Pentium 4 (R) processor cluster are included.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132054715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
An efficient end-host architecture for cluster communication 一种用于集群通信的高效终端主机架构
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392605
Xin Qi, Gabriel Parmer, R. West
{"title":"An efficient end-host architecture for cluster communication","authors":"Xin Qi, Gabriel Parmer, R. West","doi":"10.1109/CLUSTR.2004.1392605","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392605","url":null,"abstract":"Cluster computing environments built from commodity hardware have provided a cost-effective solution for many scientific and high-performance applications. Likewise, middleware techniques have provided the basis for large-scale applications to communicate and exchange data across the various end-hosts in a distributed system. Unfortunately, middleware services are typically encapsulated in user-level address spaces that suffer from scheduling delays and communication overheads induced by the host kernel. For various high performance distributed computing applications such overheads are unacceptable. This work therefore addresses the problem of providing an efficient end-host architecture to support application-specific communication services at user-level, without the need to explicitly schedule such services or copy data via the kernel. We briefly describe a sandboxing mechanism that allows applications to configure and deploy services at user-level that may execute in the context of any address space. Using Linux as the basis for our approach, we focus specifically on the implementation of a user-space network protocol stack that avoids copying data via the kernel when communicating with the network interface. Our approach enables services to efficiently process and forward data via proxies, or intermediate hosts, in the communication path of high performance data streams. Unlike other user-level networking implementations, our method makes no special hardware requirements. Results show that we achieve a substantial increase in throughput, and a reduction in jitter, over comparable user-space communication methods.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133441209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
On fairness in distributed job scheduling across multiple sites 多站点分布式作业调度的公平性研究
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392599
Gerald Sabin, Vishvesh Sahasrabudhe, P. Sadayappan
{"title":"On fairness in distributed job scheduling across multiple sites","authors":"Gerald Sabin, Vishvesh Sahasrabudhe, P. Sadayappan","doi":"10.1109/CLUSTR.2004.1392599","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392599","url":null,"abstract":"Coordinated scheduling across multiple sites over the grid has become a possibility due to grid technologies such as Globus and the Silver meta-scheduler. This would allow user jobs to be transparently executed at remote sites across the grid, instead of a particular local cluster. Previous research has shown this type of job distribution to be beneficial in terms of average metrics such as loss of capacity and turnaround time. This research has sparked interest in implementing such schemes, for example on the Cluster Ohio system. However, an issue that has not been addressed is that of fairness - will jobs at less loaded sites be significantly adversely affected by the distributed scheduling schemes? Trace based simulations show that indeed, there can be considerable unfairness to the less loaded sites when previously proposed distributed scheduling schemes are used. We assess approaches to enhance fairness to jobs at local sites and show that they improve fairness while also providing very good overall performance.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133066303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
JuxtaView - a tool for interactive visualization of large imagery on scalable tiled displays 用于在可伸缩的平铺显示器上显示大型图像的交互式可视化工具
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392640
N. K. Krishnaprasad, V. Vishwanath, S. Venkataraman, A. G. Rao, L. Renambot, J. Leigh, Andrew E. Johnson, B. Davis
{"title":"JuxtaView - a tool for interactive visualization of large imagery on scalable tiled displays","authors":"N. K. Krishnaprasad, V. Vishwanath, S. Venkataraman, A. G. Rao, L. Renambot, J. Leigh, Andrew E. Johnson, B. Davis","doi":"10.1109/CLUSTR.2004.1392640","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392640","url":null,"abstract":"JuxtaView is a cluster-based application for viewing ultra-high-resolution images on scalable tiled displays. We present in JuxtaView, a new parallel computing and distributed memory approach for out-of-core montage visualization, using LambdaRAM, a software-based network-level cache system. The ultimate goal of JuxtaView is to enable a user to interactively roam through potentially terabytes of distributed, spatially referenced image data such as those from electron microscopes, satellites and aerial photographs. In working towards this goal, we describe our first prototype implemented over a local area network, where the image is distributed using LambdaRAM, on the memory of all nodes of a PC cluster driving a tiled display wall. Aggressive prefetching schemes employed by LambdaRAM help to reduce latency involved in remote memory access. We compare LambdaRAM with a more traditional memory-mapped file approach for out-of-core visualization.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132284929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
RAAC: an architecture for scalable, reliable storage in clusters RAAC:用于集群中可伸缩、可靠存储的体系结构
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392613
Manoj Pillai, Mario Lauria
{"title":"RAAC: an architecture for scalable, reliable storage in clusters","authors":"Manoj Pillai, Mario Lauria","doi":"10.1109/CLUSTR.2004.1392613","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392613","url":null,"abstract":"Striping data across multiple nodes has been recognized as an effective technique for delivering high-bandwidth I/O to applications running on clusters. However the technique is vulnerable to disk failure. We present an I/O architecture for clusters called reliable array of autonomous controllers (RAAC) that builds on the technique of RAID style data redundancy. The RAAC architecture uses a two-tier layout that enables the system to scale in terms of storage capacity and transfer bandwidth while avoiding the synchronization overhead incurred in a distributed RAID system. We describe our implementation of RAAC in PVFS, and compare the performance of parity-based redundancy in RAAC and in a conventional distributed RAID architecture.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133009008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Grid systems deployment & management using Rocks 使用Rocks进行网格系统部署和管理
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392631
Federico D. Sacerdoti, Sandeep Chandra, K. Bhatia
{"title":"Grid systems deployment & management using Rocks","authors":"Federico D. Sacerdoti, Sandeep Chandra, K. Bhatia","doi":"10.1109/CLUSTR.2004.1392631","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392631","url":null,"abstract":"Wide-area grid deployments are becoming a standard for shared cyberinfrastructure within scientific domain communities. These systems enable resource sharing, data management and publication, collaboration, and shared development of community resources. This work describes the systems management solution developed for one such grid deployment, the GEON Grid (GEOsciences Network), a domain-specific grid of clusters for geological research. GEON provides a standardized base software stack across all sites to ensure interoperability while providing structures that allow local customization. This situation gives rise to a set of requirements that are difficult to satisfy with existing tools. Cluster management software is available that allows administrators to specify and install a common software stack on all nodes of a single cluster and enable centralized control and diagnostics of its components with minimal effort. While grid deployments have similar management requirements to computational clusters, they have faced a lack of available tools to address their needs. We describe extensions to the Rocks cluster distribution to satisfy several key goals of the GEON Grid, and show how these wide-area cluster integration extensions satisfy the most important of these goals.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124468435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
GNET-1: gigabit Ethernet network testbed GNET-1:千兆以太网测试平台
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392616
Yuetsu Kodama, T. Kudoh, Ryousei Takano, Hitoshi Sato, Osamu Tatebe, Satoshi Sekiguchi
{"title":"GNET-1: gigabit Ethernet network testbed","authors":"Yuetsu Kodama, T. Kudoh, Ryousei Takano, Hitoshi Sato, Osamu Tatebe, Satoshi Sekiguchi","doi":"10.1109/CLUSTR.2004.1392616","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392616","url":null,"abstract":"GNET-1 is a fully programmable network testbed. It provides functions such as wide area network emulation, network instrumentation, traffic shaping, and traffic generation at gigabit Ethernet wire speeds by programming the core FPGA. GNET-1 is a powerful tool for developing network-aware grid software. It is also a network monitoring and traffic-shaping tool that provides high-performance communication over wide area networks. This work describes several sample uses of GNET-1 and presents its architecture.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"226 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132487183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信