Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum Pub Date : 2012-05-21 DOI:10.1109/IPDPSW.2012.63

S. Ramalingam, Mary W. Hall, Chun Chen

{"title":"Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study","authors":"S. Ramalingam, Mary W. Hall, Chun Chen","doi":"10.1109/IPDPSW.2012.63","DOIUrl":null,"url":null,"abstract":"Scientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive understanding required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the single processor sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scalable scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. We use CHiLL (Compos able High-Level Loop Transformation Framework) to apply source level transformations tailored to the special needs of sparse computations and automatically generate highly optimized PETSc functions. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities.","PeriodicalId":378335,"journal":{"name":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2012.63","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Scientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive understanding required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the single processor sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scalable scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. We use CHiLL (Compos able High-Level Loop Transformation Framework) to apply source level transformations tailored to the special needs of sparse computations and automatically generate highly optimized PETSc functions. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities.

查看原文本刊更多论文

使用编译器辅助专门化改进高性能稀疏库:PETSc案例研究

科学库是以一种通用的方式编写的，以预测各种减少优化机会的用例。通过将库代码专门化到其执行上下文(调用库代码的应用程序、使用的输入数据集、体系结构平台及其后端编译器)，可以获得显著的性能提升。这种专门化通常不会进行，因为它很耗时，导致代码不可移植，并且需要应用程序科学家可能不具备的性能调优专业知识。在上述上下文中对库专门化的工具支持可以潜在地减少所需的广泛理解，同时显著提高性能、代码重用和可移植性。在这项工作中，我们研究了在NERSC Hopper Cray XE6超级计算机上的三个可扩展科学应用程序的背景下，在PETSc(可移植，可扩展科学计算工具包)中专化单处理器稀疏线性代数函数所获得的性能增益。我们使用CHiLL(可组合的高级循环转换框架)应用适合稀疏计算特殊需要的源级转换，并自动生成高度优化的PETSc函数。我们在库函数上展示了超过1.8倍的显著性能改进，在使用PETSc的稀疏矩阵功能的三个可扩展应用程序上，总体收益为9%到24%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

自引率

0.00%

发文量