{"title":"用于开源集群故障管理的端到端框架:Ranger","authors":"John L. Hammond, T. Minyard, J. Browne","doi":"10.1145/1838574.1838583","DOIUrl":null,"url":null,"abstract":"The scale and complexity of both hardware and software on large open source software systems such as Ranger make occurrence of faults and failures inevitable. What is not inevitable is that they should be allowed to go undetected, nor that diagnosis and recovery from failures should continue to be largely manual and effort intensive. This paper presents a framework for end-to-end fault management for open source clusters which is being developed on Ranger, but which targets general open source software based clusters. The elements of the framework are: a rationalized system logging stack for Linux, low overhead log and status monitoring, and a multilevel suite of diagnostic analyses. This paper describes this framework, presents the accomplishments to date, the results which have been obtained with the elements of the framework which are in place, and the plans for future development including a solicitation for collaboration on the project.","PeriodicalId":257555,"journal":{"name":"TeraGrid Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"End-to-end framework for fault management for open source clusters: Ranger\",\"authors\":\"John L. Hammond, T. Minyard, J. Browne\",\"doi\":\"10.1145/1838574.1838583\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The scale and complexity of both hardware and software on large open source software systems such as Ranger make occurrence of faults and failures inevitable. What is not inevitable is that they should be allowed to go undetected, nor that diagnosis and recovery from failures should continue to be largely manual and effort intensive. This paper presents a framework for end-to-end fault management for open source clusters which is being developed on Ranger, but which targets general open source software based clusters. The elements of the framework are: a rationalized system logging stack for Linux, low overhead log and status monitoring, and a multilevel suite of diagnostic analyses. This paper describes this framework, presents the accomplishments to date, the results which have been obtained with the elements of the framework which are in place, and the plans for future development including a solicitation for collaboration on the project.\",\"PeriodicalId\":257555,\"journal\":{\"name\":\"TeraGrid Conference\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"TeraGrid Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1838574.1838583\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"TeraGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1838574.1838583","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
End-to-end framework for fault management for open source clusters: Ranger
The scale and complexity of both hardware and software on large open source software systems such as Ranger make occurrence of faults and failures inevitable. What is not inevitable is that they should be allowed to go undetected, nor that diagnosis and recovery from failures should continue to be largely manual and effort intensive. This paper presents a framework for end-to-end fault management for open source clusters which is being developed on Ranger, but which targets general open source software based clusters. The elements of the framework are: a rationalized system logging stack for Linux, low overhead log and status monitoring, and a multilevel suite of diagnostic analyses. This paper describes this framework, presents the accomplishments to date, the results which have been obtained with the elements of the framework which are in place, and the plans for future development including a solicitation for collaboration on the project.