{"title":"Online Critical Path Profiling for Parallel Applications","authors":"Wenbin Zhu, P. Bridges, A. Maccabe","doi":"10.1109/CLUSTR.2005.347048","DOIUrl":null,"url":null,"abstract":"Online monitoring of parallel applications is increasingly important for techniques such as load balancing, protocol adaptation, and online anomaly detection. Unfortunately, existing online monitoring techniques only monitor individual hosts in a distributed-memory parallel application. In this paper, we show how a new monitoring technique, message-centric monitoring, can be used for online monitoring of the complete critical path in distributed-memory parallel applications. Results from an MPI-based message-centric monitoring prototype called IMPuLSE show that it has less than 3% runtime overhead, accurately measures whole-system performance as the application runs, and captures data that can be used by nodes to detect unusual system behaviors at runtime","PeriodicalId":255312,"journal":{"name":"2005 IEEE International Conference on Cluster Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2005.347048","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Online monitoring of parallel applications is increasingly important for techniques such as load balancing, protocol adaptation, and online anomaly detection. Unfortunately, existing online monitoring techniques only monitor individual hosts in a distributed-memory parallel application. In this paper, we show how a new monitoring technique, message-centric monitoring, can be used for online monitoring of the complete critical path in distributed-memory parallel applications. Results from an MPI-based message-centric monitoring prototype called IMPuLSE show that it has less than 3% runtime overhead, accurately measures whole-system performance as the application runs, and captures data that can be used by nodes to detect unusual system behaviors at runtime