2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)最新文献

筛选
英文 中文
Towards informatic analysis of syslogs 走向syslog日志的信息化分析
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392628
Jon Stearley
{"title":"Towards informatic analysis of syslogs","authors":"Jon Stearley","doi":"10.1109/CLUSTR.2004.1392628","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392628","url":null,"abstract":"The complexity and cost of isolating the root cause of system problems in large parallel computers generally scales with the size of the system. Syslog messages provide a primary source of system feedback, but manual review is tedious and error prone. Informatic analysis can be used to detect subtle anomalies in the syslog message stream, thereby increasing the availability of the overall system. In This work the author describes the use of the bioinformatic-inspired Teiresias algorithm to automatically classify syslog messages, and compare it to an existing log analysis tool (SLCT). He then describes the use of occurrence statistics to group time-correlated messages, and present a simple graphical user interface for viewing analysis results. Finally, example analyses of syslogs from three independent clusters are presented.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122194137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 142
Analysis of microbenchmarks for performance tuning of clusters 分析用于集群性能调优的微基准
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392636
M. Sottile, R. Minnich
{"title":"Analysis of microbenchmarks for performance tuning of clusters","authors":"M. Sottile, R. Minnich","doi":"10.1109/CLUSTR.2004.1392636","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392636","url":null,"abstract":"Microbenchmarks, i.e. very small computational kernels, have become commonly used for quantitative measures of node performance in clusters. For example, a commonly used benchmark measures the amount of time required to perform a fixed quantum of work. Unfortunately, this benchmark is one of many that violate well known rules from sampling theory, leading to erroneous, contradictory or misleading results. At a minimum, these types of benchmarks can not be used to identify time-based activities that may interfere with and hence limit application performance. Our original and primary goal remains to identify noise in the system due to periodic activities that are not part of user application code. We discuss why the 'fixed quantum of work' benchmark provides data that is of limited use for analysis; and we show code for, discuss, and analyze results from a microbenchmark which follows good rules of sampling hygiene, and hence provides useful data for analysis.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131831716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
NIC-based offload of dynamic user-defined modules for Myrinet clusters 为Myrinet集群提供基于网卡的动态用户定义模块卸载
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392618
A. Wagner, Hyun-Wook Jin, D. Panda, R. Riesen
{"title":"NIC-based offload of dynamic user-defined modules for Myrinet clusters","authors":"A. Wagner, Hyun-Wook Jin, D. Panda, R. Riesen","doi":"10.1109/CLUSTR.2004.1392618","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392618","url":null,"abstract":"Many of the modern networks used to interconnect nodes in cluster-based computing systems provide network-interface cards (NICs) that offer programmable processors. Substantial research has been done with the focus of offloading processing from the host to the NIC processor. However, the research has primarily focused on the static offload of specific features to the NIC, mainly to support the optimization of common collective and synchronization-based communications. We describe the design and implementation of a framework based on MP1CH-GM to support the dynamic NIC-based offload of user-defined modules for Myrinet clusters. We evaluate our implementation on a 16-node cluster using a NIC-based version of the common broadcast operation and we find a maximum factor of improvement of 1.2 with respect to total latency as well as a maximum factor of improvement of 2.2 with respect to average CPU utilization under conditions of process skew. In addition, we see that these improvements increase with system size, indicating that our NIC-based framework offers enhanced scalability when compared to a purely host-based approach.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131202082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage 分级布隆过滤器阵列(HBA):一种新颖的、可扩展的元数据管理系统,用于基于集群的大型存储
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392614
Yifeng Zhu, Hong Jiang, Jun Wang
{"title":"Hierarchical Bloom filter arrays (HBA): a novel, scalable metadata management system for large cluster-based storage","authors":"Yifeng Zhu, Hong Jiang, Jun Wang","doi":"10.1109/CLUSTR.2004.1392614","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392614","url":null,"abstract":"An efficient and distributed scheme for file mapping or file lookup scheme is critical in decentralizing metadata management within a group of metadata servers. This work presents a technique called HBA (hierarchical Bloom filter arrays) to map file names to the servers holding their metadata. Two levels of probabilistic arrays, i.e., Bloom filter arrays, with different accuracies are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, while the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Extensive trace-driven simulations have shown our HBA design to be highly effective and efficient in improving performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters).","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125620142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
QMP-MVIA: a message passing system for Linux clusters with gigabit Ethernet mesh connections QMP-MVIA:用于具有千兆以太网网状连接的Linux集群的消息传递系统
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392651
Jie Chen, R. Edwards, W. Mao
{"title":"QMP-MVIA: a message passing system for Linux clusters with gigabit Ethernet mesh connections","authors":"Jie Chen, R. Edwards, W. Mao","doi":"10.1109/CLUSTR.2004.1392651","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392651","url":null,"abstract":"Recent progress in performance coupled with a decline in price for copper-based gigabit Ethernet (GigE) interconnects makes them an attractive alternative to expensive high speed network interconnects (NIC) when constructing Linux clusters. However traditional message passing systems based on TCP for GigE interconnects cannot fully utilize the raw performance of today's GigE interconnects due to the overhead of kernel involvement and multiple memory copies during sending and receiving messages. The overhead is more evident in the case of mesh connected Linux clusters using multiple GigE interconnects in a single host. We present a general message passing system called QMP-MVIA (QCD Message Passing over M-VIA) for Linux clusters with mesh connections using GigE interconnects. In particular, we evaluate and compare the performance characteristics of TCP and M-VIA (an implementation of the VIA specification) software for a mesh communication architecture to demonstrate the feasibility of using M-VIA as a point-to-point communication software, on which QMP-MVIA is based. Furthermore, we illustrate the design and implementation of QMP-MVIA for mesh connected Linux clusters with emphasis on both point-to-point and collective communications, and demonstrate that QMP-MVIA message passing system using GigE interconnects achieves bandwidth and latency that are not only better than systems based on TCP but also compare favorably to systems using some of the specialized high speed interconnects in a switched architecture at much lower cost.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126250489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
XChange: coupling parallel applications in a dynamic environment XChange:在动态环境中耦合并行应用程序
2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI: 10.1109/CLUSTR.2004.1392646
H. Abbasi, M. Wolf, K. Schwan, G. Eisenhauer, Andrew D. Hilton
{"title":"XChange: coupling parallel applications in a dynamic environment","authors":"H. Abbasi, M. Wolf, K. Schwan, G. Eisenhauer, Andrew D. Hilton","doi":"10.1109/CLUSTR.2004.1392646","DOIUrl":"https://doi.org/10.1109/CLUSTR.2004.1392646","url":null,"abstract":"Modern computational science applications are becoming increasingly multidisciplinary, involving widely distributed research teams and their underlying computational platforms. A common problem for the grid applications used in these environments is the necessity to couple multiple, parallel subsystems, with examples ranging from data exchanges between cooperating, linked parallel programs, to concurrent data streaming to distributed storage engines. This work presents the XChange/sub mxn/ middleware infrastructure for coupling componentized distributed applications. XChange/sub mxn/ implements the basic functionality of well-known services like the CCA Forum's MxN project, by providing efficient data redistribution across parallel application components. Beyond such basic functionality, however, XChange/sub mxn/ also addresses two of the problems faced by wide area scientific collaborations, which are (1) the need to deal with dynamic application/component behaviors, such as dynamic arrivals and departures due to the availability of additional resources, and (2) the need to 'match' data formats across disparate application components and research teams. In response to these needs, XChange/sub mxn/ uses an anonymous publish/subscribe model for linking interacting components, and the data being exchanged is dynamically specialized and transformed to match end point requirements. The pub/sub paradigm makes it easy to deal with dynamic component arrivals and departures. Dynamic data transformation enables the 'inflight' correction of data or needs mismatches for cooperating components. This work describes the design and implementation of XChange/sub mxn/, and it evaluates its implementation compared to those of less flexible transports like MPI. It also highlights the utility ofXChange/sub mxn/'s 'inflight' data specialization, by applying it to the SmartPointer parallel data visualization environment developed at our institution. Interestingly, using XChange/sub mxn/ did not significantly affect performance but led to a reduction in the size of the code base.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129859953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信