用于高性能计算的自动调整

D. Padua
{"title":"用于高性能计算的自动调整","authors":"D. Padua","doi":"10.1145/2088256.2088264","DOIUrl":null,"url":null,"abstract":"Program performance depends not only on the algorithms and data structures implemented in the program but also on coding parameters. These parameters include frequency and size of messages, shape of loop tiles, and minimum number of iterations required for parallel execution of a loop. Making the right selection of algorithms, data structures and coding parameters for a given target machine can be an onerous task in part because of the many machine parameters that must be taken into account and the interaction between these parameters. Important machine parameters include cache size, memory bandwidth, communication costs, and overhead. Furthermore, some of the selections must often be reassessed when porting to a different machine even when this machine does not differ significantly from the original target.\n It is clearly advantageous to make use of tools and techniques that help reduce the initial effort of programming for performance as well as the cost of porting. The tool that comes first to mind is the compiler. Compilers were developed to enable machine independent programming and, to this end, apply powerful code generation and optimization strategies that take into account machine parameters. However, compilers not always suffice. They operate almost exclusively at the coding level and even at this low level they are not always effective. For example, compilers often fail to reorganize loops in the best manner when generating code for microprocessor vector extensions. Good use of these extensions today requires manual intervention.\n Autotuning programs are those capable of generating one or several versions of a program. These versions could be derived from a parameterized program, or from descriptions at a higher level of abstraction that could take the form of algorithms or even problem specification. It is also desirable to take into account target machine parameters and the characteristics of the input data in the generation process.\n When multiple versions are generated, one is selected at compile-time or at run time by carrying out an empirical search that executes the versions with representative data and measures program performance to guide the selection.\n Autotuning programs can be written in conventional code such as Fortran, C, C++, or java, annotated with transformations that can be applied to the whole program or to code segments. Alternatively, autotuning programs can be written in a very high level declarative notation that represent algorithms or problems to be solved.\n Although the initial cost of developing an autotuning program is higher than that of developing a conventional program, it has the advantage that much of the analysis required for the first target machine is done automatically and that it can be ported across machines and machine classes maintaining good performance.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Autotuning for high performance computing\",\"authors\":\"D. Padua\",\"doi\":\"10.1145/2088256.2088264\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Program performance depends not only on the algorithms and data structures implemented in the program but also on coding parameters. These parameters include frequency and size of messages, shape of loop tiles, and minimum number of iterations required for parallel execution of a loop. Making the right selection of algorithms, data structures and coding parameters for a given target machine can be an onerous task in part because of the many machine parameters that must be taken into account and the interaction between these parameters. Important machine parameters include cache size, memory bandwidth, communication costs, and overhead. Furthermore, some of the selections must often be reassessed when porting to a different machine even when this machine does not differ significantly from the original target.\\n It is clearly advantageous to make use of tools and techniques that help reduce the initial effort of programming for performance as well as the cost of porting. The tool that comes first to mind is the compiler. Compilers were developed to enable machine independent programming and, to this end, apply powerful code generation and optimization strategies that take into account machine parameters. However, compilers not always suffice. They operate almost exclusively at the coding level and even at this low level they are not always effective. For example, compilers often fail to reorganize loops in the best manner when generating code for microprocessor vector extensions. Good use of these extensions today requires manual intervention.\\n Autotuning programs are those capable of generating one or several versions of a program. These versions could be derived from a parameterized program, or from descriptions at a higher level of abstraction that could take the form of algorithms or even problem specification. It is also desirable to take into account target machine parameters and the characteristics of the input data in the generation process.\\n When multiple versions are generated, one is selected at compile-time or at run time by carrying out an empirical search that executes the versions with representative data and measures program performance to guide the selection.\\n Autotuning programs can be written in conventional code such as Fortran, C, C++, or java, annotated with transformations that can be applied to the whole program or to code segments. Alternatively, autotuning programs can be written in a very high level declarative notation that represent algorithms or problems to be solved.\\n Although the initial cost of developing an autotuning program is higher than that of developing a conventional program, it has the advantage that much of the analysis required for the first target machine is done automatically and that it can be ported across machines and machine classes maintaining good performance.\",\"PeriodicalId\":241950,\"journal\":{\"name\":\"High Performance Computational Finance\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"High Performance Computational Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2088256.2088264\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"High Performance Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2088256.2088264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

程序的性能不仅取决于程序中实现的算法和数据结构,还取决于编码参数。这些参数包括消息的频率和大小、循环块的形状以及并行执行循环所需的最小迭代次数。为给定的目标机器正确选择算法、数据结构和编码参数可能是一项繁重的任务,部分原因是必须考虑许多机器参数以及这些参数之间的相互作用。重要的机器参数包括缓存大小、内存带宽、通信成本和开销。此外,在移植到另一台机器时,即使这台机器与原始目标没有显著差异,也必须经常重新评估一些选择。利用工具和技术来帮助减少最初的性能编程工作以及移植成本显然是有利的。首先想到的工具是编译器。开发编译器是为了实现与机器无关的编程,为此,应用强大的代码生成和考虑机器参数的优化策略。然而,编译器并不总是足够的。它们几乎只在编码层起作用,即使在这么低的层次上,它们也并不总是有效的。例如,在为微处理器向量扩展生成代码时,编译器常常不能以最佳方式重新组织循环。今天,这些扩展的良好使用需要人工干预。自动调优程序是那些能够生成一个或多个版本的程序。这些版本可以来自参数化的程序,或者来自更高抽象层次的描述,这些描述可以采用算法甚至问题规范的形式。在生成过程中还需要考虑目标机参数和输入数据的特性。当生成多个版本时,在编译时或运行时通过执行具有代表性数据的版本的经验搜索来选择一个版本,并测量程序性能以指导选择。自动调优程序可以用传统的代码(如Fortran、C、c++或java)编写,并使用可以应用于整个程序或代码段的转换进行注释。另外,自动调优程序可以用非常高级的声明性符号编写,表示要解决的算法或问题。尽管开发自动调优程序的初始成本高于开发传统程序,但它的优点是,第一台目标机器所需的大部分分析都是自动完成的,并且可以跨机器和机器类移植,从而保持良好的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Autotuning for high performance computing
Program performance depends not only on the algorithms and data structures implemented in the program but also on coding parameters. These parameters include frequency and size of messages, shape of loop tiles, and minimum number of iterations required for parallel execution of a loop. Making the right selection of algorithms, data structures and coding parameters for a given target machine can be an onerous task in part because of the many machine parameters that must be taken into account and the interaction between these parameters. Important machine parameters include cache size, memory bandwidth, communication costs, and overhead. Furthermore, some of the selections must often be reassessed when porting to a different machine even when this machine does not differ significantly from the original target. It is clearly advantageous to make use of tools and techniques that help reduce the initial effort of programming for performance as well as the cost of porting. The tool that comes first to mind is the compiler. Compilers were developed to enable machine independent programming and, to this end, apply powerful code generation and optimization strategies that take into account machine parameters. However, compilers not always suffice. They operate almost exclusively at the coding level and even at this low level they are not always effective. For example, compilers often fail to reorganize loops in the best manner when generating code for microprocessor vector extensions. Good use of these extensions today requires manual intervention. Autotuning programs are those capable of generating one or several versions of a program. These versions could be derived from a parameterized program, or from descriptions at a higher level of abstraction that could take the form of algorithms or even problem specification. It is also desirable to take into account target machine parameters and the characteristics of the input data in the generation process. When multiple versions are generated, one is selected at compile-time or at run time by carrying out an empirical search that executes the versions with representative data and measures program performance to guide the selection. Autotuning programs can be written in conventional code such as Fortran, C, C++, or java, annotated with transformations that can be applied to the whole program or to code segments. Alternatively, autotuning programs can be written in a very high level declarative notation that represent algorithms or problems to be solved. Although the initial cost of developing an autotuning program is higher than that of developing a conventional program, it has the advantage that much of the analysis required for the first target machine is done automatically and that it can be ported across machines and machine classes maintaining good performance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信