FTS 2016 Workshop Keynote Speech

D. Abramson
{"title":"FTS 2016 Workshop Keynote Speech","authors":"D. Abramson","doi":"10.1109/CLUSTER.2016.98","DOIUrl":null,"url":null,"abstract":"Debugging software has always been difficult, with little tool support available. Finding faults in parallel programs is even harder because the machines and problems are so large, and the amount of state to be examined becomes prohibitive. Faults are often introduced when codes are modified, the software or hardware environment changes or they are scaled up to solve larger problems. All too often we hear the programmers scream “It's not my fault!” Over the years we have developed a technique called “Relative Debugging”, in which a code is debugged against another, reference, version. This makes the process simpler because programmers can compare the state of computation between a faulty version and a previous code that is correct, and the programmer doesn't need to have a mental model of what the program state should be. However, relative debugging can also be expensive because it needs to compare large data structures across the machine. Parallel computers offer a way of accelerating the comparisons using parallel algorithms, making the technique practical. In this talk I will introduce relative debugging, show how it assists test and debug, and discuss the various techniques used to scale it up to very large problems and machines. Bio: Professor David Abramson has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. At CSIRO he was the program leader of the Division of Information Technology High Performance Computing Program, and was also an adjunct Associate Professor at RMIT in Melbourne. He served as a program manager and chief investigator in the Co-operative Research Centre for Intelligent Decisions Systems and the Co-operative Research Centre for Enterprise Distributed Systems. He was the Director of the Monash e-Education Centre and a Professor of Computer Science in the Faculty of Information Technology at Monash University. Abramson is currently the Director of the Research Computing Centre at the University of Queensland. He is a fellow of the Association for Computing Machinery (ACM), the Academy of Science and Technological Engineering (ATSE) and the Australian Computer Society (ACS), and a Senior Member of the IEEE. xv 2016 IEEE International Conference on Cluster Computing 2168-9253/16 $31.00 © 2016 IEEE DOI 10.1109/CLUSTER.2016.98 497","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2016.98","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Debugging software has always been difficult, with little tool support available. Finding faults in parallel programs is even harder because the machines and problems are so large, and the amount of state to be examined becomes prohibitive. Faults are often introduced when codes are modified, the software or hardware environment changes or they are scaled up to solve larger problems. All too often we hear the programmers scream “It's not my fault!” Over the years we have developed a technique called “Relative Debugging”, in which a code is debugged against another, reference, version. This makes the process simpler because programmers can compare the state of computation between a faulty version and a previous code that is correct, and the programmer doesn't need to have a mental model of what the program state should be. However, relative debugging can also be expensive because it needs to compare large data structures across the machine. Parallel computers offer a way of accelerating the comparisons using parallel algorithms, making the technique practical. In this talk I will introduce relative debugging, show how it assists test and debug, and discuss the various techniques used to scale it up to very large problems and machines. Bio: Professor David Abramson has been involved in computer architecture and high performance computing research since 1979. He has held appointments at Griffith University, CSIRO, RMIT and Monash University. At CSIRO he was the program leader of the Division of Information Technology High Performance Computing Program, and was also an adjunct Associate Professor at RMIT in Melbourne. He served as a program manager and chief investigator in the Co-operative Research Centre for Intelligent Decisions Systems and the Co-operative Research Centre for Enterprise Distributed Systems. He was the Director of the Monash e-Education Centre and a Professor of Computer Science in the Faculty of Information Technology at Monash University. Abramson is currently the Director of the Research Computing Centre at the University of Queensland. He is a fellow of the Association for Computing Machinery (ACM), the Academy of Science and Technological Engineering (ATSE) and the Australian Computer Society (ACS), and a Senior Member of the IEEE. xv 2016 IEEE International Conference on Cluster Computing 2168-9253/16 $31.00 © 2016 IEEE DOI 10.1109/CLUSTER.2016.98 497
FTS 2016研讨会主题演讲
调试软件一直都很困难,几乎没有可用的工具支持。在并行程序中发现故障更加困难,因为机器和问题是如此之大,要检查的状态数量变得令人望而却步。当代码被修改、软件或硬件环境发生变化,或者为了解决更大的问题而扩大规模时,通常会引入故障。我们经常听到程序员尖叫:“这不是我的错!”多年来,我们开发了一种称为“相对调试”的技术,在这种技术中,代码是根据另一个参考版本进行调试的。这使得过程更简单,因为程序员可以比较错误版本和之前正确的代码之间的计算状态,并且程序员不需要对程序状态应该是什么有一个心理模型。然而,相对调试也可能代价高昂,因为它需要跨机器比较大型数据结构。并行计算机提供了一种使用并行算法加速比较的方法,使该技术变得实用。在这次演讲中,我将介绍相对调试,展示它如何帮助测试和调试,并讨论用于将其扩展到非常大的问题和机器的各种技术。David Abramson教授自1979年以来一直从事计算机体系结构和高性能计算的研究。他曾在格里菲斯大学、CSIRO、RMIT和莫纳什大学任职。在CSIRO,他是信息技术高性能计算项目部门的项目负责人,也是墨尔本皇家理工学院的兼职副教授。他曾担任智能决策系统合作研究中心和企业分布式系统合作研究中心的项目经理和首席研究员。他曾担任莫纳什电子教育中心主任和莫纳什大学信息技术学院计算机科学教授。艾布拉姆森目前是昆士兰大学研究计算中心的主任。他是计算机协会(ACM)、科学与技术工程学院(ATSE)和澳大利亚计算机协会(ACS)的会员,也是IEEE的高级会员。xv 2016 IEEE国际集群计算会议2168-9253/16 $31.00©2016 IEEE DOI 10.1109/ Cluster .2016.98 497
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信