Ghazanfar Ali, Jon R. Hass, A. Sill, E. Hojati, Tommy Dang, Yong Chen
{"title":"Redfish- nagios:基于Redfish遥测模型的可扩展带外数据中心监控框架","authors":"Ghazanfar Ali, Jon R. Hass, A. Sill, E. Hojati, Tommy Dang, Yong Chen","doi":"10.1145/3526064.3534108","DOIUrl":null,"url":null,"abstract":"Current monitoring tools for high-performance computing (HPC) systems are often inefficient in terms of scalability and interfacing with modern data center management APIs. This inefficiency leads to a lack of effective management of infrastructure of modern data centers. Nagios is one of the widely used industry-standard tools for data center infrastructure monitoring, which mainly include monitoring of nodes and associated hardware and software components. However, current Nagios monitoring has special requirements that introduce several limitations. First, a significant human effort is needed for the configuration of monitored nodes in the Nagios server. Second, the Nagios Remote Plugin Executor and the Nagios Service Check Acceptor are required on the Nagios server and each monitored node for active and passive monitoring, respectively. Third, Nagios monitoring also requires monitoring-specific agents on each monitored node. These shortcomings are inherently due to Nagios' in-band implementation nature. To overcome these limitations, we introduced Redfish-Nagios, a scalable out-of-band monitoring tool for modern HPC systems. It integrates the Nagios server with the out-of-band Distributed Management Task Force's Redfish telemetry model, which is implemented in the baseboard management controller of the nodes. This integration eliminates the requirements of any agent, plugin, hardware component, or configuration on the monitored nodes. It is potentially a paradigm shift in Nagios-based monitoring for two reasons. First, it simplifies communication between the Nagios server and monitored nodes. Second, it saves the computational cost by removing the requirements of running complex Nagios-native protocols and agents on the monitored nodes. The Redfish-Nagios integration methodology enables monitoring of next-generation HPC systems using the scalable and modern Redfish telemetry model and interface.","PeriodicalId":183096,"journal":{"name":"Fifth International Workshop on Systems and Network Telemetry and Analytics","volume":"140 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Redfish-Nagios: A Scalable Out-of-Band Data Center Monitoring Framework Based on Redfish Telemetry Model\",\"authors\":\"Ghazanfar Ali, Jon R. Hass, A. Sill, E. Hojati, Tommy Dang, Yong Chen\",\"doi\":\"10.1145/3526064.3534108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Current monitoring tools for high-performance computing (HPC) systems are often inefficient in terms of scalability and interfacing with modern data center management APIs. This inefficiency leads to a lack of effective management of infrastructure of modern data centers. Nagios is one of the widely used industry-standard tools for data center infrastructure monitoring, which mainly include monitoring of nodes and associated hardware and software components. However, current Nagios monitoring has special requirements that introduce several limitations. First, a significant human effort is needed for the configuration of monitored nodes in the Nagios server. Second, the Nagios Remote Plugin Executor and the Nagios Service Check Acceptor are required on the Nagios server and each monitored node for active and passive monitoring, respectively. Third, Nagios monitoring also requires monitoring-specific agents on each monitored node. These shortcomings are inherently due to Nagios' in-band implementation nature. To overcome these limitations, we introduced Redfish-Nagios, a scalable out-of-band monitoring tool for modern HPC systems. It integrates the Nagios server with the out-of-band Distributed Management Task Force's Redfish telemetry model, which is implemented in the baseboard management controller of the nodes. This integration eliminates the requirements of any agent, plugin, hardware component, or configuration on the monitored nodes. It is potentially a paradigm shift in Nagios-based monitoring for two reasons. First, it simplifies communication between the Nagios server and monitored nodes. Second, it saves the computational cost by removing the requirements of running complex Nagios-native protocols and agents on the monitored nodes. The Redfish-Nagios integration methodology enables monitoring of next-generation HPC systems using the scalable and modern Redfish telemetry model and interface.\",\"PeriodicalId\":183096,\"journal\":{\"name\":\"Fifth International Workshop on Systems and Network Telemetry and Analytics\",\"volume\":\"140 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fifth International Workshop on Systems and Network Telemetry and Analytics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3526064.3534108\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fifth International Workshop on Systems and Network Telemetry and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3526064.3534108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Redfish-Nagios: A Scalable Out-of-Band Data Center Monitoring Framework Based on Redfish Telemetry Model
Current monitoring tools for high-performance computing (HPC) systems are often inefficient in terms of scalability and interfacing with modern data center management APIs. This inefficiency leads to a lack of effective management of infrastructure of modern data centers. Nagios is one of the widely used industry-standard tools for data center infrastructure monitoring, which mainly include monitoring of nodes and associated hardware and software components. However, current Nagios monitoring has special requirements that introduce several limitations. First, a significant human effort is needed for the configuration of monitored nodes in the Nagios server. Second, the Nagios Remote Plugin Executor and the Nagios Service Check Acceptor are required on the Nagios server and each monitored node for active and passive monitoring, respectively. Third, Nagios monitoring also requires monitoring-specific agents on each monitored node. These shortcomings are inherently due to Nagios' in-band implementation nature. To overcome these limitations, we introduced Redfish-Nagios, a scalable out-of-band monitoring tool for modern HPC systems. It integrates the Nagios server with the out-of-band Distributed Management Task Force's Redfish telemetry model, which is implemented in the baseboard management controller of the nodes. This integration eliminates the requirements of any agent, plugin, hardware component, or configuration on the monitored nodes. It is potentially a paradigm shift in Nagios-based monitoring for two reasons. First, it simplifies communication between the Nagios server and monitored nodes. Second, it saves the computational cost by removing the requirements of running complex Nagios-native protocols and agents on the monitored nodes. The Redfish-Nagios integration methodology enables monitoring of next-generation HPC systems using the scalable and modern Redfish telemetry model and interface.