C. Leangsuksun, T. Rao, Anand Tikotekar, S. Scott, Richard Libby, J. Vetter, Yung-Chin Fang, H. Ong
{"title":"IPMI-based Efficient Notification Framework for Large Scale Cluster Computing","authors":"C. Leangsuksun, T. Rao, Anand Tikotekar, S. Scott, Richard Libby, J. Vetter, Yung-Chin Fang, H. Ong","doi":"10.1109/CCGRID.2006.150","DOIUrl":null,"url":null,"abstract":"The demand for an efficient faith tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly affects the scalability and performance of monitoring and management tools. In this paper, we propose a problem notification framework that directly addresses the issue of monitor scalability. We first present the design and implementation of our step-by-step approach to analyzing, filtering, and classifying the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our Web-based cluster management system that provides hardware controls at both cluster and nodal levels","PeriodicalId":419226,"journal":{"name":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","volume":"41 S1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2006.150","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
The demand for an efficient faith tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and event management. The increasing level of details at the hardware and software layer clearly affects the scalability and performance of monitoring and management tools. In this paper, we propose a problem notification framework that directly addresses the issue of monitor scalability. We first present the design and implementation of our step-by-step approach to analyzing, filtering, and classifying the plethora of node statistics. Then, we present experimental results to show that our approach only needs minimal system resource and thus has low overhead. Finally, we introduce our Web-based cluster management system that provides hardware controls at both cluster and nodal levels