2005 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Transparently Achieving Superior Socket Performance Using Zero Copy Socket Direct Protocol over 20Gb/s InfiniBand Links 在20Gb/s InfiniBand链路上使用零拷贝套接字直接协议实现卓越的套接字性能
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347027
Dror Goldenberg, Michael Kagan, Ran Ravid, Michael S. Tsirkin
{"title":"Transparently Achieving Superior Socket Performance Using Zero Copy Socket Direct Protocol over 20Gb/s InfiniBand Links","authors":"Dror Goldenberg, Michael Kagan, Ran Ravid, Michael S. Tsirkin","doi":"10.1109/CLUSTR.2005.347027","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347027","url":null,"abstract":"Sockets Direct Protocol (SDP) is a byte stream protocol that utilizes the capabilities of the InfiniBand fabric to transparently achieve performance gains for existing socket-based networked applications. In this paper we discuss an implementation of Zero Copy support for synchronous send()/recv() socket calls, that uses the remote DMA capability of InfiniBand for SDP data transfers. We added this support to the open-source implementation of SDP over InfiniBand. We evaluate this implementation over a 20 Gb/s InfiniBand link. We demonstrate scalability of Zero Copy and show its benefits for systems that utilize multiple socket connections in parallel. For example, enabling Zero Copy with 8 active connections yields a bandwidth growth from 630MB/s to 1360MB/s, at the same time reducing the CPU utilization by a factor often","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114194632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Multi-stream MPA Multi-stream MPA
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347033
Caitlin Bestler
{"title":"Multi-stream MPA","authors":"Caitlin Bestler","doi":"10.1109/CLUSTR.2005.347033","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347033","url":null,"abstract":"An extension to MPA, the TCP adaptation layer for RDMA, is proposed to provide the benefits of multi-streaming and multi-homing without requiring RNICs to implement SCTP","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131852641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Optimizing Latency under Throughput Requirements for Streaming Applications on Cluster Execution 在吞吐量要求下优化集群执行流应用程序的延迟
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347051
F. Guirado, A. Ripoll, C. Roig, E. Luque
{"title":"Optimizing Latency under Throughput Requirements for Streaming Applications on Cluster Execution","authors":"F. Guirado, A. Ripoll, C. Roig, E. Luque","doi":"10.1109/CLUSTR.2005.347051","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347051","url":null,"abstract":"Parallelism in applications that act on a stream of input data can be exploited with two different approaches, spatial and temporal. In this paper we propose a new task mapping algorithm, called EXPERT, to exploit temporal parallelism efficiently when the streaming application is running in a pipeline fashion. We compare the performance of spatial and temporal approaches, in terms of latency and throughput for a video compression application. The results show that the pipeline execution with the task assignment provided by EXPERT algorithm, significantly overcomes spatial parallelism. Additionally, this temporal parallelism presents better scalability results when the dimension of the problem is augmented","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127060082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Cluster Spanning with Virtual Environments 利用虚拟环境进行集群跨接
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347087
Wesley Emeneker, D. Stanzione
{"title":"Cluster Spanning with Virtual Environments","authors":"Wesley Emeneker, D. Stanzione","doi":"10.1109/CLUSTR.2005.347087","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347087","url":null,"abstract":"Summary form only given. The ability to easily span parallel and distributed jobs over multiple physical clusters is a potentially attractive proposal. Such an ability would allow researchers to pool all available cluster resources at a given site. Multiple clusters at a single research site has become the norm whether in an academic, government, or industrial environment. However, there are substantial barriers to transparently spanning clusters in this environment, not the least of which are authentication, authorization, sharing data, library mismatching, version skew, compiler access, environment variables, and networking. These problems arise from the heterogeneity of software environments, and in order to circumvent many of these issues, we look at visualization, and its potential use to create dynamic virtual clusters in order to provide a consistent, transparent environment for job spanning. The Grid computing community has addressed the problem of spanning jobs across distributed resources. However, grid solutions tend to focus on authorization, authentication, and data migration problems associated with far flung resources communicating over public network links. In a campus grid environment, where clusters can be directly connected via internal networks, many of these problems disappear. An alternative to the grid approach way to accomplish this spanning is to use common visualization techniques to create virtual clusters made up of one or more physical clusters. In this work, we examine the ontology of visualization techniques from the least amount of visualization to the most, and evaluate their suitability for building dynamic virtual clusters","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126606423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Spare Instance: an Adaptive Mechanism for Managing Cluster Applications 备用实例:管理集群应用程序的自适应机制
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347092
Yingyin Jiang, Dan Meng, Danjun Liu
{"title":"Spare Instance: an Adaptive Mechanism for Managing Cluster Applications","authors":"Yingyin Jiang, Dan Meng, Danjun Liu","doi":"10.1109/CLUSTR.2005.347092","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347092","url":null,"abstract":"Some measurements of performance have been conducted in Dawning 4000 cluster. We measured the elapsed time of putting the spare instance into use to be 0.12 sec by redirection, while taking the old policy, the time for an Apache instance takes 1.05 sec. For an Oracle 10G, the cost is 7.69 sec. The spare servers contribute to a public spare resource pool. In the case of the master instance's failure, the spare instance avoids service gap and provide non-stop service. For the SLA violation incurred by the short time overload, the available resource could be enlarged instantaneously, much quickly than launching a new instance temporarily. In the best case, the service capacity may be enhanced one time, thus the QoS performance could be guaranteed. Furthermore, for the workload fluctuation, the overhead of frequently launching and terminating instance has been cut down. By introducing the spare instance, the adaptive capacity of the application has been greatly improved in the case of short-term overload and the workload fluctuation","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115305550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Multi-Resource Monitoring for Predictive Job Scheduling with ScoPro 基于ScoPro的预测作业调度动态多资源监控
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347013
A. Sodan, Lun Liu
{"title":"Dynamic Multi-Resource Monitoring for Predictive Job Scheduling with ScoPro","authors":"A. Sodan, Lun Liu","doi":"10.1109/CLUSTR.2005.347013","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347013","url":null,"abstract":"Modern job schedulers move towards applying dynamic approaches like time sharing or adaptive resource allocation to accommodate grid jobs or to better utilize local resources. Also, the resources may be heterogeneous and a proper distribution of the application's workload be hard to estimate. Our ScoPro monitoring tool permits to obtain and to store resource-related behavior information for parallel applications. This information is used to create an application signature for predictive use in future runs and to dynamically check competition under time-shared execution and imbalances of workload on heterogeneous resources. ScoPro is applicable to production runs on standard clusters. As main innovative contributions ScoPro can be triggered by job-scheduling events, can monitor several coscheduled jobs concurrently for accurate prediction of slowdowns, and performs realtime short-period measurements with low intrusion during the monitoring, while avoiding any intrusion overhead for the non-monitored part of the job execution","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130838284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Web Services and Ontology Based Performance Visualization Framework for Grid Environments 网格环境下基于Web服务和本体的性能可视化框架
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347070
Kukjin Lee, D. Rover
{"title":"A Web Services and Ontology Based Performance Visualization Framework for Grid Environments","authors":"Kukjin Lee, D. Rover","doi":"10.1109/CLUSTR.2005.347070","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347070","url":null,"abstract":"A combination of Web services and ontology technologies has a potential to provide a consistent, seamless, and intelligent integration of heterogeneous resources. Using these technologies, we present a new visualization framework which integrates distributed and heterogeneous performance information into system-level visualizations. Most performance visualization tools often are specific to a particular resource at a certain level of the system, possibly with fixed views. Thus, they limit a user's ability to observe a performance problem associated with multiple resources across platforms and semantic levels. Addressing this issue with Web services and ontology, the new framework allows resources, at different levels, to be viewed and interacted with in a consistent and coordinated manner. This paper describes our framework, focusing on how Web services and ontology help address challenging issues in performance visualization for grid environments","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126006881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters 分布式集群合作与激励耦合的一个案例
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347038
R. Ranjan, R. Buyya, A. Harwood
{"title":"A Case for Cooperative and Incentive-Based Coupling of Distributed Clusters","authors":"R. Ranjan, R. Buyya, A. Harwood","doi":"10.1109/CLUSTR.2005.347038","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347038","url":null,"abstract":"Interest in grid computing has grown significantly over the past five years. Management of distributed cluster resources is a key issue in grid computing. Central to management of resources is the effectiveness of resource allocation, as it determines the overall utility of the system. In this paper, we propose a new grid system that consists of grid federation agents which couple together distributed cluster resources to enable a cooperative environment. The agents use a computational economy methodology, that facilitates QoS scheduling, with a cost-time scheduling heuristic based on a scalable, shared federation directory. We show by simulation, while some users that are local to popular resources can experience higher cost and/or longer delays, the overall users' QoS demands across the federation are better met. Also, the federation's average case message passing complexity is seen to be scalable, though some jobs in the system may lead to large numbers of messages before being scheduled","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130038808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Job-Site Level Fault Tolerance for Cluster and Grid environments 集群和网格环境的工作站点级容错
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347043
K. Limaye, C. Leangsuksun, Z. Greenwood, S. Scott, C. Engelmann, Richard Libby, K. Chanchio
{"title":"Job-Site Level Fault Tolerance for Cluster and Grid environments","authors":"K. Limaye, C. Leangsuksun, Z. Greenwood, S. Scott, C. Engelmann, Richard Libby, K. Chanchio","doi":"10.1109/CLUSTR.2005.347043","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347043","url":null,"abstract":"In order to adopt high performance clusters and grid computing for mission critical applications, fault tolerance is a necessity. Common fault tolerance techniques in distributed systems are normally achieved with checkpoint-recovery and job replication on alternative resources, in cases of a system outage. The first approach depends on the system's MTTR while the latter approach depends on the availability of alternative sites to run replicas. There is a need for complementing these approaches by proactively handling failures at a job-site level, ensuring the system high availability with no loss of user submitted jobs. This paper discusses a novel fault tolerance technique that enables the job-site recovery in Beowulf cluster-based grid environments, whereas existing techniques give up a failed system by seeking alternative resources. Our results suggest sizable aggregate performance improvement during an implementation of our method in Globus-enabled HA-OSCAR. The technique called ''smart failover\" provides a transparent and graceful recovery mechanism that saves job states in a local job-manager queue and transfers those states to the backup server periodically, and in critical system events. Thus whenever a failover occurs, the backup server is able to restart the jobs from their last saved state","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121420675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Interoperability through Conformance; The iWARP validation process 通过一致性实现互操作性;iWARP验证过程
2005 IEEE International Conference on Cluster Computing Pub Date : 2005-09-01 DOI: 10.1109/CLUSTR.2005.347029
Barry Reinhold
{"title":"Interoperability through Conformance; The iWARP validation process","authors":"Barry Reinhold","doi":"10.1109/CLUSTR.2005.347029","DOIUrl":"https://doi.org/10.1109/CLUSTR.2005.347029","url":null,"abstract":"The ability of systems to interoperate while using the iWARP protocol suite is a well understood requirement for market acceptance of the technology. Since an improper implementation of the iWARP protocol opens the door to the possibility of silent data corruption, rigorous testing must be in place to ensure that devices properly implement the protocol. This level of verification is difficult to achieve using standard system and interoperability procedures. Verification technology was needed that would enable developers to identify failures in their implementations in the context of specific lower layer protocol conditions. The industry banded together during 2004 to enable the development of verification tools that would provide the detailed testing required by the iWARP community. This paper discusses these tools, the reasoning behind them, what has been achieved to date, and the work that still needs to be done","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123414869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信