{"title":"HCW 2014 Keynote Talk","authors":"D. Abramson","doi":"10.1109/IPDPSW.2014.207","DOIUrl":null,"url":null,"abstract":"Summary form only given. CCDB, implements a strategy called \"Comparative Debugging\", which helps trace software errors by comparing two executions of a program at the same time - one code being a reference version and the other faulty. Specifically, users write \"assertions\" that detect when data structure contents in the two executions diverge, and using the dataflow of the code it is possible to locate the source of the divergence. Comparative debugging is effective at finding errors when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs and accelerators. In this talk I will discuss the design and implementation of CCDB, and show that it operates on highly parallel hybrid CPU/GPU systems. CCDB provides a uniform comparison interface that allows programmers to examine the global runtime status across different types of hybrid programs, including OpenACC and UPC programs. I will present a case study in finding errors using the hybrid version of the stellarator particle simulation DELTA5D, on the Titan machine at ORNL. I will also illustrate that the debugger scales well, and is effective with up to 10,000 nodes and 5,000 GPUs.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.207","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Summary form only given. CCDB, implements a strategy called "Comparative Debugging", which helps trace software errors by comparing two executions of a program at the same time - one code being a reference version and the other faulty. Specifically, users write "assertions" that detect when data structure contents in the two executions diverge, and using the dataflow of the code it is possible to locate the source of the divergence. Comparative debugging is effective at finding errors when code is migrated from one platform to another, and this is of significant interest for hybrid computer architectures containing CPUs and accelerators. In this talk I will discuss the design and implementation of CCDB, and show that it operates on highly parallel hybrid CPU/GPU systems. CCDB provides a uniform comparison interface that allows programmers to examine the global runtime status across different types of hybrid programs, including OpenACC and UPC programs. I will present a case study in finding errors using the hybrid version of the stellarator particle simulation DELTA5D, on the Titan machine at ORNL. I will also illustrate that the debugger scales well, and is effective with up to 10,000 nodes and 5,000 GPUs.