Redfish- nagios:基于Redfish遥测模型的可扩展带外数据中心监控框架

Fifth International Workshop on Systems and Network Telemetry and Analytics Pub Date : 2022-06-27 DOI:10.1145/3526064.3534108

Ghazanfar Ali, Jon R. Hass, A. Sill, E. Hojati, Tommy Dang, Yong Chen

{"title":"Redfish- nagios:基于Redfish遥测模型的可扩展带外数据中心监控框架","authors":"Ghazanfar Ali, Jon R. Hass, A. Sill, E. Hojati, Tommy Dang, Yong Chen","doi":"10.1145/3526064.3534108","DOIUrl":null,"url":null,"abstract":"Current monitoring tools for high-performance computing (HPC) systems are often inefficient in terms of scalability and interfacing with modern data center management APIs. This inefficiency leads to a lack of effective management of infrastructure of modern data centers. Nagios is one of the widely used industry-standard tools for data center infrastructure monitoring, which mainly include monitoring of nodes and associated hardware and software components. However, current Nagios monitoring has special requirements that introduce several limitations. First, a significant human effort is needed for the configuration of monitored nodes in the Nagios server. Second, the Nagios Remote Plugin Executor and the Nagios Service Check Acceptor are required on the Nagios server and each monitored node for active and passive monitoring, respectively. Third, Nagios monitoring also requires monitoring-specific agents on each monitored node. These shortcomings are inherently due to Nagios' in-band implementation nature. To overcome these limitations, we introduced Redfish-Nagios, a scalable out-of-band monitoring tool for modern HPC systems. It integrates the Nagios server with the out-of-band Distributed Management Task Force's Redfish telemetry model, which is implemented in the baseboard management controller of the nodes. This integration eliminates the requirements of any agent, plugin, hardware component, or configuration on the monitored nodes. It is potentially a paradigm shift in Nagios-based monitoring for two reasons. First, it simplifies communication between the Nagios server and monitored nodes. Second, it saves the computational cost by removing the requirements of running complex Nagios-native protocols and agents on the monitored nodes. The Redfish-Nagios integration methodology enables monitoring of next-generation HPC systems using the scalable and modern Redfish telemetry model and interface.","PeriodicalId":183096,"journal":{"name":"Fifth International Workshop on Systems and Network Telemetry and Analytics","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Redfish-Nagios: A Scalable Out-of-Band Data Center Monitoring Framework Based on Redfish Telemetry Model\",\"authors\":\"Ghazanfar Ali, Jon R. Hass, A. Sill, E. Hojati, Tommy Dang, Yong Chen\",\"doi\":\"10.1145/3526064.3534108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current monitoring tools for high-performance computing (HPC) systems are often inefficient in terms of scalability and interfacing with modern data center management APIs. This inefficiency leads to a lack of effective management of infrastructure of modern data centers. Nagios is one of the widely used industry-standard tools for data center infrastructure monitoring, which mainly include monitoring of nodes and associated hardware and software components. However, current Nagios monitoring has special requirements that introduce several limitations. First, a significant human effort is needed for the configuration of monitored nodes in the Nagios server. Second, the Nagios Remote Plugin Executor and the Nagios Service Check Acceptor are required on the Nagios server and each monitored node for active and passive monitoring, respectively. Third, Nagios monitoring also requires monitoring-specific agents on each monitored node. These shortcomings are inherently due to Nagios' in-band implementation nature. To overcome these limitations, we introduced Redfish-Nagios, a scalable out-of-band monitoring tool for modern HPC systems. It integrates the Nagios server with the out-of-band Distributed Management Task Force's Redfish telemetry model, which is implemented in the baseboard management controller of the nodes. This integration eliminates the requirements of any agent, plugin, hardware component, or configuration on the monitored nodes. It is potentially a paradigm shift in Nagios-based monitoring for two reasons. First, it simplifies communication between the Nagios server and monitored nodes. Second, it saves the computational cost by removing the requirements of running complex Nagios-native protocols and agents on the monitored nodes. The Redfish-Nagios integration methodology enables monitoring of next-generation HPC systems using the scalable and modern Redfish telemetry model and interface.\",\"PeriodicalId\":183096,\"journal\":{\"name\":\"Fifth International Workshop on Systems and Network Telemetry and Analytics\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fifth International Workshop on Systems and Network Telemetry and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3526064.3534108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth International Workshop on Systems and Network Telemetry and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3526064.3534108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

当前用于高性能计算(HPC)系统的监控工具在可伸缩性和与现代数据中心管理api的接口方面通常效率低下。这种低效率导致现代数据中心缺乏对基础设施的有效管理。Nagios是用于数据中心基础设施监视的广泛使用的行业标准工具之一，主要包括监视节点和相关的硬件和软件组件。但是，当前的Nagios监视有一些特殊的需求，从而引入了一些限制。首先，需要大量人力来配置Nagios服务器中受监视的节点。其次，Nagios服务器和每个被监视节点上都需要Nagios远程插件执行器和Nagios服务检查接受器，分别用于主动和被动监视。第三，Nagios监视还需要在每个被监视的节点上都有特定于监视的代理。这些缺点本质上是由于Nagios的带内实现特性造成的。为了克服这些限制，我们引入了Redfish-Nagios，这是一种用于现代HPC系统的可扩展带外监控工具。它将Nagios服务器与带外分布式管理任务组的Redfish遥测模型集成在一起，该模型在节点的基板管理控制器中实现。这种集成消除了对被监视节点上的任何代理、插件、硬件组件或配置的需求。出于两个原因，它可能是基于nagios的监视中的一种范式转变。首先，它简化了Nagios服务器和被监视节点之间的通信。其次，它消除了在被监视节点上运行复杂的nagios原生协议和代理的需求，从而节省了计算成本。Redfish- nagios集成方法可以使用可扩展的现代Redfish遥测模型和接口监控下一代高性能计算系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Redfish-Nagios: A Scalable Out-of-Band Data Center Monitoring Framework Based on Redfish Telemetry Model

Current monitoring tools for high-performance computing (HPC) systems are often inefficient in terms of scalability and interfacing with modern data center management APIs. This inefficiency leads to a lack of effective management of infrastructure of modern data centers. Nagios is one of the widely used industry-standard tools for data center infrastructure monitoring, which mainly include monitoring of nodes and associated hardware and software components. However, current Nagios monitoring has special requirements that introduce several limitations. First, a significant human effort is needed for the configuration of monitored nodes in the Nagios server. Second, the Nagios Remote Plugin Executor and the Nagios Service Check Acceptor are required on the Nagios server and each monitored node for active and passive monitoring, respectively. Third, Nagios monitoring also requires monitoring-specific agents on each monitored node. These shortcomings are inherently due to Nagios' in-band implementation nature. To overcome these limitations, we introduced Redfish-Nagios, a scalable out-of-band monitoring tool for modern HPC systems. It integrates the Nagios server with the out-of-band Distributed Management Task Force's Redfish telemetry model, which is implemented in the baseboard management controller of the nodes. This integration eliminates the requirements of any agent, plugin, hardware component, or configuration on the monitored nodes. It is potentially a paradigm shift in Nagios-based monitoring for two reasons. First, it simplifies communication between the Nagios server and monitored nodes. Second, it saves the computational cost by removing the requirements of running complex Nagios-native protocols and agents on the monitored nodes. The Redfish-Nagios integration methodology enables monitoring of next-generation HPC systems using the scalable and modern Redfish telemetry model and interface.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Fifth International Workshop on Systems and Network Telemetry and Analytics

自引率

0.00%

发文量