{"title":"Autotuning for high performance computing","authors":"D. Padua","doi":"10.1145/2088256.2088264","DOIUrl":null,"url":null,"abstract":"Program performance depends not only on the algorithms and data structures implemented in the program but also on coding parameters. These parameters include frequency and size of messages, shape of loop tiles, and minimum number of iterations required for parallel execution of a loop. Making the right selection of algorithms, data structures and coding parameters for a given target machine can be an onerous task in part because of the many machine parameters that must be taken into account and the interaction between these parameters. Important machine parameters include cache size, memory bandwidth, communication costs, and overhead. Furthermore, some of the selections must often be reassessed when porting to a different machine even when this machine does not differ significantly from the original target.\n It is clearly advantageous to make use of tools and techniques that help reduce the initial effort of programming for performance as well as the cost of porting. The tool that comes first to mind is the compiler. Compilers were developed to enable machine independent programming and, to this end, apply powerful code generation and optimization strategies that take into account machine parameters. However, compilers not always suffice. They operate almost exclusively at the coding level and even at this low level they are not always effective. For example, compilers often fail to reorganize loops in the best manner when generating code for microprocessor vector extensions. Good use of these extensions today requires manual intervention.\n Autotuning programs are those capable of generating one or several versions of a program. These versions could be derived from a parameterized program, or from descriptions at a higher level of abstraction that could take the form of algorithms or even problem specification. It is also desirable to take into account target machine parameters and the characteristics of the input data in the generation process.\n When multiple versions are generated, one is selected at compile-time or at run time by carrying out an empirical search that executes the versions with representative data and measures program performance to guide the selection.\n Autotuning programs can be written in conventional code such as Fortran, C, C++, or java, annotated with transformations that can be applied to the whole program or to code segments. Alternatively, autotuning programs can be written in a very high level declarative notation that represent algorithms or problems to be solved.\n Although the initial cost of developing an autotuning program is higher than that of developing a conventional program, it has the advantage that much of the analysis required for the first target machine is done automatically and that it can be ported across machines and machine classes maintaining good performance.","PeriodicalId":241950,"journal":{"name":"High Performance Computational Finance","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"High Performance Computational Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2088256.2088264","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Program performance depends not only on the algorithms and data structures implemented in the program but also on coding parameters. These parameters include frequency and size of messages, shape of loop tiles, and minimum number of iterations required for parallel execution of a loop. Making the right selection of algorithms, data structures and coding parameters for a given target machine can be an onerous task in part because of the many machine parameters that must be taken into account and the interaction between these parameters. Important machine parameters include cache size, memory bandwidth, communication costs, and overhead. Furthermore, some of the selections must often be reassessed when porting to a different machine even when this machine does not differ significantly from the original target.
It is clearly advantageous to make use of tools and techniques that help reduce the initial effort of programming for performance as well as the cost of porting. The tool that comes first to mind is the compiler. Compilers were developed to enable machine independent programming and, to this end, apply powerful code generation and optimization strategies that take into account machine parameters. However, compilers not always suffice. They operate almost exclusively at the coding level and even at this low level they are not always effective. For example, compilers often fail to reorganize loops in the best manner when generating code for microprocessor vector extensions. Good use of these extensions today requires manual intervention.
Autotuning programs are those capable of generating one or several versions of a program. These versions could be derived from a parameterized program, or from descriptions at a higher level of abstraction that could take the form of algorithms or even problem specification. It is also desirable to take into account target machine parameters and the characteristics of the input data in the generation process.
When multiple versions are generated, one is selected at compile-time or at run time by carrying out an empirical search that executes the versions with representative data and measures program performance to guide the selection.
Autotuning programs can be written in conventional code such as Fortran, C, C++, or java, annotated with transformations that can be applied to the whole program or to code segments. Alternatively, autotuning programs can be written in a very high level declarative notation that represent algorithms or problems to be solved.
Although the initial cost of developing an autotuning program is higher than that of developing a conventional program, it has the advantage that much of the analysis required for the first target machine is done automatically and that it can be ported across machines and machine classes maintaining good performance.