Hostile Cache Implications for Small, Dense Linear Solves

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC) Pub Date : 2020-10-02 DOI:10.1109/MCHPC51950.2020.00010

Tom Deakin, J. Cownie, Simon McIntosh-Smith, J. Lovegrove, R. Smedley-Stevenson

引用次数: 0

Abstract

The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerkin based approaches, is to construct and solve the small dense linear systems locally within each element and avoid the global assembly entirely. The different independent linear systems can be solved concurrently in a batched manner, however we have found that the memory subsystem can show destructive behaviour in this paradigm, severely affecting the performance. In this paper we demonstrate the range of performance that can be obtained by allocating the local systems differently, along with evidence to attribute the reasons behind these differences.

查看原文本刊更多论文

小的、密集的线性解的敌对缓存含义

在有限元代码中，刚度矩阵的完整组装可能会因为存储巨大的矩阵而占用内存而令人望而却步。对于基于Galerkin的不连续方法，一种特别有效的优化和解决方法是在每个元素内部局部构建和求解小型密集线性系统，从而完全避免全局装配。不同的独立线性系统可以同时以批处理的方式求解，但是我们发现存储子系统在这种范式中会表现出破坏性行为，严重影响性能。在本文中，我们展示了通过不同地分配本地系统可以获得的性能范围，以及归因于这些差异背后原因的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)

自引率

0.00%

发文量