Performance Analysis of Compressed Batch Matrix Operations on Small Matrices

2019 International Conference on High Performance Computing & Simulation (HPCS) Pub Date : 2019-07-01 DOI:10.1109/HPCS48598.2019.9188206

B. Gravelle, B. Norris

引用次数: 0

Abstract

Dense matrix computations with very small matrices present unique challenges for performance optimization and occupy and important space in many HPC computations including PDE solvers, machine learning algorithms, and Kalman filters. Using batch computation can improve their performance significantly and compressed batch (also called block-interleaved) data structures can further improve performance. In this paper we present a detailed study of how compressed batch computations use HPC hardware and how they can be most effectively tuned for cache performance.

查看原文本刊更多论文

小矩阵上压缩批处理矩阵运算的性能分析

具有非常小矩阵的密集矩阵计算对性能优化提出了独特的挑战，并且在许多HPC计算中占据了重要的空间，包括PDE求解器、机器学习算法和卡尔曼滤波器。使用批处理计算可以显著提高它们的性能，压缩的批处理(也称为块交错)数据结构可以进一步提高性能。在本文中，我们详细研究了压缩批处理计算如何使用HPC硬件，以及如何最有效地调整缓存性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on High Performance Computing & Simulation (HPCS)

自引率

0.00%

发文量