{"title":"Performance Analysis of Compressed Batch Matrix Operations on Small Matrices","authors":"B. Gravelle, B. Norris","doi":"10.1109/HPCS48598.2019.9188206","DOIUrl":null,"url":null,"abstract":"Dense matrix computations with very small matrices present unique challenges for performance optimization and occupy and important space in many HPC computations including PDE solvers, machine learning algorithms, and Kalman filters. Using batch computation can improve their performance significantly and compressed batch (also called block-interleaved) data structures can further improve performance. In this paper we present a detailed study of how compressed batch computations use HPC hardware and how they can be most effectively tuned for cache performance.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"19 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Dense matrix computations with very small matrices present unique challenges for performance optimization and occupy and important space in many HPC computations including PDE solvers, machine learning algorithms, and Kalman filters. Using batch computation can improve their performance significantly and compressed batch (also called block-interleaved) data structures can further improve performance. In this paper we present a detailed study of how compressed batch computations use HPC hardware and how they can be most effectively tuned for cache performance.