2005 IEEE International Conference on Cluster Computing最新文献_第6页

Transparently Achieving Superior Socket Performance Using Zero Copy Socket Direct Protocol over 20Gb/s InfiniBand Links 在20Gb/s InfiniBand链路上使用零拷贝套接字直接协议实现卓越的套接字性能

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347027

Dror Goldenberg, Michael Kagan, Ran Ravid, Michael S. Tsirkin

引用次数: 20

Multi-stream MPA Multi-stream MPA

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347033

Caitlin Bestler

引用次数: 2

Spare Instance: an Adaptive Mechanism for Managing Cluster Applications 备用实例:管理集群应用程序的自适应机制

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347092

Yingyin Jiang, Dan Meng, Danjun Liu

引用次数: 0

Job-Site Level Fault Tolerance for Cluster and Grid environments 集群和网格环境的工作站点级容错

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347043

K. Limaye, C. Leangsuksun, Z. Greenwood, S. Scott, C. Engelmann, Richard Libby, K. Chanchio

{"title":"Job-Site Level Fault Tolerance for Cluster and Grid environments","authors":"K. Limaye, C. Leangsuksun, Z. Greenwood, S. Scott, C. Engelmann, Richard Libby, K. Chanchio","doi":"10.1109/CLUSTR.2005.347043","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347043","url":null,"abstract":"In order to adopt high performance clusters and grid computing for mission critical applications, fault tolerance is a necessity. Common fault tolerance techniques in distributed systems are normally achieved with checkpoint-recovery and job replication on alternative resources, in cases of a system outage. The first approach depends on the system's MTTR while the latter approach depends on the availability of alternative sites to run replicas. There is a need for complementing these approaches by proactively handling failures at a job-site level, ensuring the system high availability with no loss of user submitted jobs. This paper discusses a novel fault tolerance technique that enables the job-site recovery in Beowulf cluster-based grid environments, whereas existing techniques give up a failed system by seeking alternative resources. Our results suggest sizable aggregate performance improvement during an implementation of our method in Globus-enabled HA-OSCAR. The technique called ''smart failover\" provides a transparent and graceful recovery mechanism that saves job states in a local job-manager queue and transfers those states to the backup server periodically, and in critical system events. Thus whenever a failover occurs, the backup server is able to restart the jobs from their last saved state","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121420675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

A Web Services and Ontology Based Performance Visualization Framework for Grid Environments 网格环境下基于Web服务和本体的性能可视化框架

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347070

Kukjin Lee, D. Rover

引用次数: 1

Optimizing Latency under Throughput Requirements for Streaming Applications on Cluster Execution 在吞吐量要求下优化集群执行流应用程序的延迟

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347051

F. Guirado, A. Ripoll, C. Roig, E. Luque

引用次数: 13

Cluster Spanning with Virtual Environments 利用虚拟环境进行集群跨接

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347087

Wesley Emeneker, D. Stanzione

{"title":"Cluster Spanning with Virtual Environments","authors":"Wesley Emeneker, D. Stanzione","doi":"10.1109/CLUSTR.2005.347087","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347087","url":null,"abstract":"Summary form only given. The ability to easily span parallel and distributed jobs over multiple physical clusters is a potentially attractive proposal. Such an ability would allow researchers to pool all available cluster resources at a given site. Multiple clusters at a single research site has become the norm whether in an academic, government, or industrial environment. However, there are substantial barriers to transparently spanning clusters in this environment, not the least of which are authentication, authorization, sharing data, library mismatching, version skew, compiler access, environment variables, and networking. These problems arise from the heterogeneity of software environments, and in order to circumvent many of these issues, we look at visualization, and its potential use to create dynamic virtual clusters in order to provide a consistent, transparent environment for job spanning. The Grid computing community has addressed the problem of spanning jobs across distributed resources. However, grid solutions tend to focus on authorization, authentication, and data migration problems associated with far flung resources communicating over public network links. In a campus grid environment, where clusters can be directly connected via internal networks, many of these problems disappear. An alternative to the grid approach way to accomplish this spanning is to use common visualization techniques to create virtual clusters made up of one or more physical clusters. In this work, we examine the ontology of visualization techniques from the least amount of visualization to the most, and evaluate their suitability for building dynamic virtual clusters","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126606423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Dynamic Multi-Resource Monitoring for Predictive Job Scheduling with ScoPro 基于ScoPro的预测作业调度动态多资源监控

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347013

A. Sodan, Lun Liu

引用次数: 6

Interoperability through Conformance; The iWARP validation process 通过一致性实现互操作性;iWARP验证过程

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347029

Barry Reinhold

引用次数: 0

A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters 分布式集群合作与激励耦合的一个案例

2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347038

R. Ranjan, R. Buyya, A. Harwood

引用次数: 38