{"title":"Push Me Pull You: Integrating Opposing Data Transport Modes for Efficient HPC Application Monitoring","authors":"O. Aaziz, J. Cook, Hadi Sharifi","doi":"10.1109/CLUSTER.2015.118","DOIUrl":null,"url":null,"abstract":"While HPC system monitoring is a necessary and accepted practice, applications are still basically opaque in the production environment. For better HPC platform management and utilization, especially as platforms push towards exascale size, HPC applications need to be more transparent in their execution in the production environment. PROMON is a framework for application monitoring in the production environment, but its design concentrated on the front end issues of offering easy to use application instrumentation. This paper presents the integration of PROMON with LDMS, a proven efficient HPC system monitoring framework. PROMON and LDMS offer a case study in integrating two disparate instrumentation and monitoring models, and the lessons are applicable to other HPC monitoring issues.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"13 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
While HPC system monitoring is a necessary and accepted practice, applications are still basically opaque in the production environment. For better HPC platform management and utilization, especially as platforms push towards exascale size, HPC applications need to be more transparent in their execution in the production environment. PROMON is a framework for application monitoring in the production environment, but its design concentrated on the front end issues of offering easy to use application instrumentation. This paper presents the integration of PROMON with LDMS, a proven efficient HPC system monitoring framework. PROMON and LDMS offer a case study in integrating two disparate instrumentation and monitoring models, and the lessons are applicable to other HPC monitoring issues.