Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji
{"title":"通过云促进HPC的运营和管理","authors":"Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji","doi":"10.14529/JSFI190105","DOIUrl":null,"url":null,"abstract":"Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.","PeriodicalId":338883,"journal":{"name":"Supercomput. Front. Innov.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Facilitating HPC Operation and Administration via Cloud\",\"authors\":\"Chaoqun Sha, Jingfeng Zhang, Lei An, Yongsheng Zhang, Zhipeng Wang, T. Ilijaš, Nejc Bat, Miha Verlic, Qing Ji\",\"doi\":\"10.14529/JSFI190105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.\",\"PeriodicalId\":338883,\"journal\":{\"name\":\"Supercomput. Front. Innov.\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Supercomput. Front. Innov.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14529/JSFI190105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Supercomput. Front. Innov.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14529/JSFI190105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
经历了巨大的增长,云计算提供了许多优于其他分布式平台的优势。高性能计算(High Performance Computing, HPC)优势的引入,带动了HPC as a Service (HPC as a Service)的发展。HPC as a Service主要关注于资源的灵活访问、成本效益和终端用户无需维护。除了提供和使用HPC caas之外,HPC中心还可以更多地利用云计算技术,例如,简化部署的HPC系统的操作和管理,这是大多数超级计算机中心普遍面临的问题。本文报告了EasyOP产品的开发,以实现一个或多个云或HPC设施可以在一个集中和统一的控制平台上运行的想法。EasyOP的主要目的是将HPC系统软硬件、故障报警、作业调度等信息发送到无锡云计算中心。经过一系列的分析和处理,我们可以通过短信、邮件、微信等方式将报警、作业调度等许多有价值的数据分享给HPC用户。更重要的是,随着数据在云计算中心的积累,EasyOP可以提供几个易于使用的功能,如用户管理,月/年报告,一屏监控等。截至2016年底,EasyOP已成功服务50多个HPC系统,节点近10000个,固定用户超过300个。
Facilitating HPC Operation and Administration via Cloud
Experiencing a tremendous growth, Cloud Computing offers a number of advantages over other distributed platforms. Introducing the advantages of High Performance Computing (HPC) also brought forward the development of HPCaaS (HPC as a Service), which has mainly focused on flexible access to resources, cost-effectiveness, and the no-maintenance-needed for end-users. Besides providing and using HPCaaS, HPC centers could leverage more from Cloud Computing technology, for instance to facilitate operation and administration of deployed HPC systems, commonly faced by most supercomputer centers. This paper reports the product, EasyOP, developed to realize the idea that one or more Cloud or HPC facilities can be run over a centralized and unified control platform. The main purpose of EasyOP is that the information of HPC systems hardware and system software, failure alarms, jobs scheduling, etc. is sent to the Wuxi cloud computing center. After a series of analysis and processing, we are able to share many valuable data, including alarm and job scheduling status, to HPC users through SMS, email, and WeChat. More importantly, with the data accumulated on the cloud computing center, EasyOP can offer several easy-to-use functions, such as user(s) management, monthly/yearly reports, one-screen monitoring and so on. By the end of 2016, EasyOP successfully served more than 50 HPC systems with almost 10000 nodes and over of 300 regular users.