Yi Ge, K. Yoda, Makiko Ito, Toshiyuki Ichiba, T. Yoshikawa, Ryota Shioya, M. Goshima
{"title":"Out-of-Step Pipeline for Gather/Scatter Instructions","authors":"Yi Ge, K. Yoda, Makiko Ito, Toshiyuki Ichiba, T. Yoshikawa, Ryota Shioya, M. Goshima","doi":"10.23919/DATE56975.2023.10137119","DOIUrl":null,"url":null,"abstract":"Wider SIMD units suffer from low scalability of gather/scatter instructions that appear in sparse matrix calculations. We address this problem with an out-of-step pipeline which tolerates bank conflicts of a multibank L1D by allowing element operations of SIMD instructions to proceed out of step with each other. We evaluated it with a sparse matrix-vector product kernel for matrices from HPCG and SuiteSparse Matrix Collection. The results show that, for the SIMD width of 1024 bit, it achieves 1.91 times improvement over a model of a conventional pipeline.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE56975.2023.10137119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Wider SIMD units suffer from low scalability of gather/scatter instructions that appear in sparse matrix calculations. We address this problem with an out-of-step pipeline which tolerates bank conflicts of a multibank L1D by allowing element operations of SIMD instructions to proceed out of step with each other. We evaluated it with a sparse matrix-vector product kernel for matrices from HPCG and SuiteSparse Matrix Collection. The results show that, for the SIMD width of 1024 bit, it achieves 1.91 times improvement over a model of a conventional pipeline.