Scalable cluster administration - Chiba City I approach and lessons learned

J. Navarro, R. Evard, Daniel Nurmi, N. Desai
{"title":"Scalable cluster administration - Chiba City I approach and lessons learned","authors":"J. Navarro, R. Evard, Daniel Nurmi, N. Desai","doi":"10.1109/CLUSTR.2002.1137749","DOIUrl":null,"url":null,"abstract":"Systems administrators of large clusters often need to perform the same administrative task hundreds or thousands of times. Administrators have traditionally performed some time-consuming tasks, such as operating system installation, configuration, and maintenance, manually. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control and scripted installation, configuration, and maintenance techniques, cluster administrators can automate these administrative tasks. Scalable cluster administration addresses this challenge: What hardware and software design techniques can cluster builders use to automate cluster administration on very large clusters? We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linux cluster; and we analyze the scalability, flexibility, performance and reliability benefits and limitations from that approach.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2002.1137749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Systems administrators of large clusters often need to perform the same administrative task hundreds or thousands of times. Administrators have traditionally performed some time-consuming tasks, such as operating system installation, configuration, and maintenance, manually. By combining network services such as DHCP, TFTP, FTP, HTTP, and NFS with remote hardware control and scripted installation, configuration, and maintenance techniques, cluster administrators can automate these administrative tasks. Scalable cluster administration addresses this challenge: What hardware and software design techniques can cluster builders use to automate cluster administration on very large clusters? We describe the approach used in the Mathematics and Computer Science Division of Argonne National Laboratory on Chiba City I, a 314-node Linux cluster; and we analyze the scalability, flexibility, performance and reliability benefits and limitations from that approach.
可伸缩的集群管理——千叶城市I的方法和经验教训
大型集群的系统管理员经常需要执行数百或数千次相同的管理任务。传统上,管理员手动执行一些耗时的任务,如操作系统安装、配置和维护。通过将网络服务(如DHCP、TFTP、FTP、HTTP和NFS)与远程硬件控制和脚本化安装、配置和维护技术相结合,集群管理员可以自动执行这些管理任务。可伸缩集群管理解决了这一挑战:集群构建器可以使用哪些硬件和软件设计技术来在非常大的集群上自动化集群管理?我们描述了阿贡国家实验室的数学和计算机科学部在千叶市I上使用的方法,一个314个节点的Linux集群;我们分析了这种方法的可伸缩性、灵活性、性能和可靠性的优点和局限性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信