{"title":"Parallel and Pipelined BRAM-Based Matrix Transposition for 6G","authors":"Jierui Chen;Chuang Yang;Xu Zhou;Mugen Peng","doi":"10.1109/TCSII.2025.3584052","DOIUrl":null,"url":null,"abstract":"In this brief, we present a parallel and pipelined algorithm for BRAM-based matrix transposition, along with its corresponding architecture, optimized specifically to meet the stringent throughput and latency demands of 6G. The architecture utilizes a novel address mapping algorithm, which exploits the coprimality between memory parameters to achieve conflict-free parallel access via a simple yet efficient prime-modulo addressing scheme.The architecture achieves conflict-free parallel memory access on BRAM, significantly improving parallelism and enhancing throughput. More importantly, by adopting a ping-pong buffering scheme, it enables fully pipelined and highly parallel matrix transposition, primarily targeting low-latency and high-throughput tasks in 6G. Experimental results show that, compared with existing implementations supporting similar matrix sizes, the architecture in this brief increases throughput significantly from 0.8 GB/s to 25.6 GB/s under a latency of 0.08ms.","PeriodicalId":13101,"journal":{"name":"IEEE Transactions on Circuits and Systems II: Express Briefs","volume":"72 8","pages":"1033-1037"},"PeriodicalIF":4.9000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems II: Express Briefs","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11059326/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
In this brief, we present a parallel and pipelined algorithm for BRAM-based matrix transposition, along with its corresponding architecture, optimized specifically to meet the stringent throughput and latency demands of 6G. The architecture utilizes a novel address mapping algorithm, which exploits the coprimality between memory parameters to achieve conflict-free parallel access via a simple yet efficient prime-modulo addressing scheme.The architecture achieves conflict-free parallel memory access on BRAM, significantly improving parallelism and enhancing throughput. More importantly, by adopting a ping-pong buffering scheme, it enables fully pipelined and highly parallel matrix transposition, primarily targeting low-latency and high-throughput tasks in 6G. Experimental results show that, compared with existing implementations supporting similar matrix sizes, the architecture in this brief increases throughput significantly from 0.8 GB/s to 25.6 GB/s under a latency of 0.08ms.
期刊介绍:
TCAS II publishes brief papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes:
Circuits: Analog, Digital and Mixed Signal Circuits and Systems
Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic
Circuits and Systems, Power Electronics and Systems
Software for Analog-and-Logic Circuits and Systems
Control aspects of Circuits and Systems.