{"title":"Architecture for an Offline Parallel Debugger","authors":"Karl Lindekugel, A. DiGirolamo, D. Stanzione","doi":"10.1109/ISPA.2008.125","DOIUrl":null,"url":null,"abstract":"This paper provides and overview of the {\\it GDBase} framework for offline parallel debuggers. The framework was designed to become the basis of debugging tools which scale successfully on systems with tens to hundreds of thousands of cores. With several systems coming online at more than 50,000 cores in the past year, debuggers which can run at these scales are now required. The proposed framework offers two features not found in current generation debugging tools: the ability to debug \"offline'', and a central database to act as a repository of debugging information. These two features enable the GDBase debugger to offer several advantages. The debugger can be used in conjunction with modern batch systems with low overhead, with user interaction taking place after the parallel system resources are freed. The use of a database and a simple API allows for multiple interfaces and data mining tools to be implemented to provide novel ways of viewing and analyzing debugging data. The database also enables cross-run analysis, and the combination of debugging, performance, and system health information. Evidence is provided of the scalability of the framework, as well as output from several simple analysis tools that have been implemented.","PeriodicalId":345341,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Parallel and Distributed Processing with Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPA.2008.125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This paper provides and overview of the {\it GDBase} framework for offline parallel debuggers. The framework was designed to become the basis of debugging tools which scale successfully on systems with tens to hundreds of thousands of cores. With several systems coming online at more than 50,000 cores in the past year, debuggers which can run at these scales are now required. The proposed framework offers two features not found in current generation debugging tools: the ability to debug "offline'', and a central database to act as a repository of debugging information. These two features enable the GDBase debugger to offer several advantages. The debugger can be used in conjunction with modern batch systems with low overhead, with user interaction taking place after the parallel system resources are freed. The use of a database and a simple API allows for multiple interfaces and data mining tools to be implemented to provide novel ways of viewing and analyzing debugging data. The database also enables cross-run analysis, and the combination of debugging, performance, and system health information. Evidence is provided of the scalability of the framework, as well as output from several simple analysis tools that have been implemented.