2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)最新文献

筛选
英文 中文
NWPerf: a system wide performance monitoring tool for large Linux clusters NWPerf:用于大型Linux集群的系统范围性能监视工具
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392637
Ryan W. Mooney, Ken P. Schmidt, R. S. Studham
{"title":"NWPerf: a system wide performance monitoring tool for large Linux clusters","authors":"Ryan W. Mooney, Ken P. Schmidt, R. S. Studham","doi":"10.1109/CLUSTR.2004.1392637","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392637","url":null,"abstract":"We present NWPerf, a new system for analyzing fine granularity performance metric data on large-scale supercomputing clusters. This tool is able to measure application efficiency on a system wide basis from both a global system perspective as well as providing a detailed view of individual applications. NWPerf provides this service while minimizing the impact on the performance of user applications. We describe the type of information that can be derived from the system, and demonstrate how the system was used detect and eliminate a performance problem in an application application that improved performance by up to several thousand percent. The NWPerf architecture has proven to be a stable and scalable platform for gathering performance data on a large 1954-CPU production Linux cluster at PNNL.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116594965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Implementation and design analysis of a network messaging module using virtual interface architecture 基于虚拟接口架构的网络消息传递模块的实现与设计分析
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392623
G. Amerson, A. Apon
{"title":"Implementation and design analysis of a network messaging module using virtual interface architecture","authors":"G. Amerson, A. Apon","doi":"10.1109/CLUSTR.2004.1392623","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392623","url":null,"abstract":"The buffered message interface (BMI) of PVFSv2 is a low level network abstraction that allows PVFSv2 to operate on any protocol that has BMI support. This work presents a BMI module that supports the VIA over an early release version of InfiniBand and also over Myrinet. The baseline bandwidth and latency of the implementation were compared to the BMI modules and were shown to achieve significantly higher performance than the TCP module, but slightly less than the CM module. Experimental results comparing a completion queue version with a notify version and using immediate versus rendezvous messages are useful to system implementors of network messaging modules.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122420533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Kerrighed and data parallelism: cluster computing on single system image operating systems kerright和数据并行:单系统映像操作系统上的集群计算
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392625
C. Morin, Renaud Lottiaux, Geoffroy R. Vallée, Pascal Gallard, D. Margery, J. Berthou, I. Scherson
{"title":"Kerrighed and data parallelism: cluster computing on single system image operating systems","authors":"C. Morin, Renaud Lottiaux, Geoffroy R. Vallée, Pascal Gallard, D. Margery, J. Berthou, I. Scherson","doi":"10.1109/CLUSTR.2004.1392625","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392625","url":null,"abstract":"A working single system image distributed operating system is presented. Dubbed Kerrighed, it provides a unified approach and support to both the MPI and the shared memory programming models. The system is operational in a 16-processor cluster at the Institut de Recherche en Informatique et Systemes Aleatoires in Rennes, France. In this paper, the system is described with emphasis on its main contributing and distinguishing factors, namely its DSM based on memory containers, its flexible handling of scheduling and checkpointing strategies, and its efficient and unified communications layer. Because of the importance and popularity of data parallel applications in these systems, we present a brief discussion of the mapping of two well known and established data parallel algorithms. It is shown that ShearSort is remarkably well suited for the architecture/system pair as is the ever so popular and important two-dimensional fast Fourier transform. (2D FFT).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114682718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
A distributed data management middleware for data-driven application systems 用于数据驱动的应用程序系统的分布式数据管理中间件
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392624
S. Langella, S. Hastings, S. Oster, T. Kurç, Ümit V. Çatalyürek, J. Saltz
{"title":"A distributed data management middleware for data-driven application systems","authors":"S. Langella, S. Hastings, S. Oster, T. Kurç, Ümit V. Çatalyürek, J. Saltz","doi":"10.1109/CLUSTR.2004.1392624","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392624","url":null,"abstract":"A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. We describe a distributed storage middleware, based on a data and metadata management framework, to address this problem. In this middleware system, applications define the structure of their input and output data using XML schemas. The system provides support for 1) registration, versioning, management of schemas, and 2) management of storage, querying, and retrieval of instance data corresponding to the schemas in distributed databases. We carry out an experimental evaluation of the system on a set of PC clusters connected over wide- (WANs) and local-area networks (LANs).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127660938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
Predicting resource demand profiles by periodicity mining 利用周期性挖掘预测资源需求曲线
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392648
A. Andrzejak, Mehmet Ceyran
{"title":"Predicting resource demand profiles by periodicity mining","authors":"A. Andrzejak, Mehmet Ceyran","doi":"10.1109/CLUSTR.2004.1392648","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392648","url":null,"abstract":"Summary form only given. Scientific computing clusters, enterprise data centers and grid and utility environments utilize the majority of the world's computing resources. Most of these resources are lightly utilized and offer a vast potential for resource sharing, an economically attractive and increasingly indispensable management option. A prerequisite for automating resource consolidation is modeling and prediction of demand characteristics. We present an approach for long-term demand characteristics prediction based on mining periodicities in historical demand data. In addition to characterizing the regularity of the past demand behavior (and so providing a measure of predictability) we propose a method for predicting probabilistic profiles which describe likely future behavior. The presented algorithms are change-adaptive in the sense that they automatically adjust to new regularities in demand patterns. A case study using data from an enterprise data center evaluates the effectiveness of the technique.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123065463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Application-specific scheduling for the organic grid 有机网格的特定于应用程序的调度
A. Chakravarti, Gerald Baumgartner, Mario Lauria
{"title":"Application-specific scheduling for the organic grid","authors":"A. Chakravarti, Gerald Baumgartner, Mario Lauria","doi":"10.1109/GRID.2004.11","DOIUrl":"https://doi.org/10.1109/GRID.2004.11","url":null,"abstract":"Summary form only given. We propose a biologically inspired and fully-decentralized approach to the organization of computation that is based on the autonomous scheduling of strongly mobile agents on a peer-to-peer network. Our approach achieves the following design objectives: near-zero knowledge of network topology, zero knowledge of system status, autonomous scheduling, distributed computation, lack of specialized nodes. Every node is equally responsible for scheduling and computation, both of which are performed with practically no information about the system. We believe that this model is ideally suited for large-scale unstructured grids such as desktop grids. This model avoids the extensive system knowledge requirements of traditional grid scheduling approaches. Contrary to the popular master/worker organization of current desktop grids, our approach does not rely on specialized super-servers or on application-specific clients. By encapsulating computation and scheduling behavior into mobile agents, we decouple both application code and scheduling functionality from the underlying infrastructure. The resulting system is one where every node can start a large grid job, and where the computation naturally organizes itself around available resources. Through the careful design of agent behavior, the resulting global organization of the computation can be customized for different classes of applications. In a previous paper, we described a proof-of-concept prototype for an independent task application. We generalize the scheduling framework and demonstrate that our approach is applicable to a computation with a highly synchronous communication pattern, namely Cannon's matrix multiplication.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131868112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A parallel I/O mechanism for distributed systems 分布式系统的并行I/O机制
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392602
Troy Baer, P. Wyckoff
{"title":"A parallel I/O mechanism for distributed systems","authors":"Troy Baer, P. Wyckoff","doi":"10.1109/CLUSTR.2004.1392602","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392602","url":null,"abstract":"Access to shared data is critical to the long term success of grids of distributed systems. As more parallel applications are being used on these grids, the need for some kind of parallel I/O facility across distributed systems increases. However, grid middleware has thus far had only limited support for distributed parallel I/O. In this paper, we present an implementation of the MPI-2 I/O interface using the Globus GridFTP client API. MPI is widely used for parallel computing, and its I/O interface maps onto a large variety of storage systems. The limitations of using GridFTP as an MPI-I/O transport mechanism are described, as well as support for parallel access to scientific data formats such as HDF and NetCDF. We compare the performance of GridFTP to that of NFS on the same network using several parallel I/O benchmarks. Our tests indicate that GridFTP can be a workable transport for parallel I/O, particularly for distributed read-only access to shared data sets.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134195708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Building highly available HPC clusters with HA-OSCAR 使用HA-OSCAR构建高可用的HPC集群
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392593
C. Leangsuksun, I. Haddad
{"title":"Building highly available HPC clusters with HA-OSCAR","authors":"C. Leangsuksun, I. Haddad","doi":"10.1109/CLUSTR.2004.1392593","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392593","url":null,"abstract":"Summary form only given. This tutorial addressed in detail all the design and implementation issues related to building HA Linux Beowulf clusters and using Linux and open source software as the base technology. In addition, the focus of the tutorial is HA-OSCAR. We present the architecture of HA-OSCAR, review of new features of the current release, explain how we implemented all the HA features, and discuss our experiments covering performance and availability, as well as our test results.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124712785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Master slave scheduling on heterogeneous star-shaped platforms with limited memory 有限内存异构星形平台上的主从调度
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392654
Arnaud Legrand, Olivier Beaumont, L. Marchal, Y. Robert
{"title":"Master slave scheduling on heterogeneous star-shaped platforms with limited memory","authors":"Arnaud Legrand, Olivier Beaumont, L. Marchal, Y. Robert","doi":"10.1109/CLUSTR.2004.1392654","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392654","url":null,"abstract":"Summary form only given. In this work, we consider the problem of allocating and scheduling a collection of independent, equal-sized tasks on heterogeneous star-shaped platforms. We also address the same problem for divisible tasks. For both cases, we take memory constraints into account. We prove strong NP-completeness results for different objective functions, namely makespan minimization and throughput maximization, on simple star-shaped platforms. We propose an approximation algorithm based on the unconstrained version (with unlimited memory) of the problem. We introduce several heuristics, which are evaluated and compared through extensive simulations. An unexpected conclusion drawn from these experiments is that classical scheduling heuristics that try to greedily minimize the completion time of each task are outperformed by the simple heuristic that consists in assigning the task to the available processor that has the smallest communication time, regardless of computation power (hence a \"bandwidth-centric\" distribution).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Rolls: modifying a standard system installer to support user-customizable cluster frontend appliances Rolls:修改标准系统安装程序以支持用户可定制的集群前端设备
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392641
Greg Bruno, M. Katz, Federico D. Sacerdoti, P. Papadopoulos
{"title":"Rolls: modifying a standard system installer to support user-customizable cluster frontend appliances","authors":"Greg Bruno, M. Katz, Federico D. Sacerdoti, P. Papadopoulos","doi":"10.1109/CLUSTR.2004.1392641","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392641","url":null,"abstract":"The Rocks toolkit uses a graph-based framework to describe the configuration of all node types (termed appliances) that make up a complete cluster. With hundreds of deployed clusters, our turnkey systems approach has shown to be quite easily adapted to different hardware and logical node configurations. However, the Rocks architecture and implementation contains a significant asymmetry: the graph definition of all appliance types except the initial frontend can be modified and extended by the end-user before installation. However, frontends can be modified only afterward by hands-on system administration. To address this administrative discontinuity between nodes and frontends, we describe the design and implementation of Rolls. First and foremost, Rolls provide both the architecture and mechanisms that enable the end-user to incrementally and programmatically modify the graph description for all appliance types. New functionality can be added and any Rocks-supplied software component can be overwritten or removed simply by inserting the desired Roll CD(s) at installation time. This symmetric approach to cluster construction has allowed us to shrink the core of the Rocks implementation while increasing flexibility for the end-user. Rolls are optional, automatically configured, cluster-aware software systems. Current add-ons include: scheduling systems (SGE, PBS), grid support (based on NSF Middleware Initiative), database support (DB2), Condor, integrity checking (Tripwire) and the Intel compiler. Community-specific Rolls can be and are developed by groups outside of the Rocks core development group.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130566356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信