{"title":"Extension VM: Interleaved Data Layout in Vector Memory","authors":"Dunbo Zhang, Qingjie Lang, Ruoxi Wang, Li Shen","doi":"10.1145/3631528","DOIUrl":null,"url":null,"abstract":"While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates from the unsuitable mapping of multidimensional data structures to two-dimensional vector memory spaces. In addition, the traditional data layout mapping method creates an irreconcilable conflict between row- and column-major accesses. Ideally, both row- and column-major accesses can take advantage of the bank parallelism of vector memory. To this end, we propose the Interleaved Data Layout (IDL) method in vector memory, which can distribute vector elements into different banks regardless of whether they are in the row- or column major category, so that any vector memory access can benefit from bank parallelism. Additionally, we propose an Extension Vector Memory (EVM) architecture to achieve IDL in vector memory. EVM can support two data layout methods and vector memory access modes simultaneously. The key idea is to continuously distribute the data that needs to be accessed from the main memory to different banks during the loading period. Thus, EVM can provide a larger spatial locality level through careful programming and the extension ISA support. The experimental results showed a 1.43-fold improvement of state-of-the-art vector processors by the proposed architecture, with an area cost of only 1.73%. Furthermore, the energy consumption was reduced by 50.1%.","PeriodicalId":50920,"journal":{"name":"ACM Transactions on Architecture and Code Optimization","volume":"79 2","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Architecture and Code Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3631528","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
While vector architecture is widely employed in processors for neural networks, signal processing, and high-performance computing; however, its performance is limited by inefficient column-major memory access. The column-major access limitation originates from the unsuitable mapping of multidimensional data structures to two-dimensional vector memory spaces. In addition, the traditional data layout mapping method creates an irreconcilable conflict between row- and column-major accesses. Ideally, both row- and column-major accesses can take advantage of the bank parallelism of vector memory. To this end, we propose the Interleaved Data Layout (IDL) method in vector memory, which can distribute vector elements into different banks regardless of whether they are in the row- or column major category, so that any vector memory access can benefit from bank parallelism. Additionally, we propose an Extension Vector Memory (EVM) architecture to achieve IDL in vector memory. EVM can support two data layout methods and vector memory access modes simultaneously. The key idea is to continuously distribute the data that needs to be accessed from the main memory to different banks during the loading period. Thus, EVM can provide a larger spatial locality level through careful programming and the extension ISA support. The experimental results showed a 1.43-fold improvement of state-of-the-art vector processors by the proposed architecture, with an area cost of only 1.73%. Furthermore, the energy consumption was reduced by 50.1%.
期刊介绍:
ACM Transactions on Architecture and Code Optimization (TACO) focuses on hardware, software, and system research spanning the fields of computer architecture and code optimization. Articles that appear in TACO will either present new techniques and concepts or report on experiences and experiments with actual systems. Insights useful to architects, hardware or software developers, designers, builders, and users will be emphasized.