Hostile Cache Implications for Small, Dense Linear Solves

Tom Deakin, J. Cownie, Simon McIntosh-Smith, J. Lovegrove, R. Smedley-Stevenson
{"title":"Hostile Cache Implications for Small, Dense Linear Solves","authors":"Tom Deakin, J. Cownie, Simon McIntosh-Smith, J. Lovegrove, R. Smedley-Stevenson","doi":"10.1109/MCHPC51950.2020.00010","DOIUrl":null,"url":null,"abstract":"The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerkin based approaches, is to construct and solve the small dense linear systems locally within each element and avoid the global assembly entirely. The different independent linear systems can be solved concurrently in a batched manner, however we have found that the memory subsystem can show destructive behaviour in this paradigm, severely affecting the performance. In this paper we demonstrate the range of performance that can be obtained by allocating the local systems differently, along with evidence to attribute the reasons behind these differences.","PeriodicalId":318919,"journal":{"name":"2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Workshop on Memory Centric High Performance Computing (MCHPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCHPC51950.2020.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The full assembly of the stiffness matrix in finite element codes can be prohibitive in terms of memory footprint resulting from storing that enormous matrix. An optimisation and work around, particularly effective for discontinuous Galerkin based approaches, is to construct and solve the small dense linear systems locally within each element and avoid the global assembly entirely. The different independent linear systems can be solved concurrently in a batched manner, however we have found that the memory subsystem can show destructive behaviour in this paradigm, severely affecting the performance. In this paper we demonstrate the range of performance that can be obtained by allocating the local systems differently, along with evidence to attribute the reasons behind these differences.
小的、密集的线性解的敌对缓存含义
在有限元代码中,刚度矩阵的完整组装可能会因为存储巨大的矩阵而占用内存而令人望而却步。对于基于Galerkin的不连续方法,一种特别有效的优化和解决方法是在每个元素内部局部构建和求解小型密集线性系统,从而完全避免全局装配。不同的独立线性系统可以同时以批处理的方式求解,但是我们发现存储子系统在这种范式中会表现出破坏性行为,严重影响性能。在本文中,我们展示了通过不同地分配本地系统可以获得的性能范围,以及归因于这些差异背后原因的证据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信