{"title":"Automorphism Ensemble Decoding on GPU: Achieving High Throughput and Low Latency for Polar and RM Codes","authors":"Yansong Li;Kairui Tian;Rongke Liu","doi":"10.1109/TSP.2025.3570740","DOIUrl":null,"url":null,"abstract":"Automorphism ensemble decoding (AED) is a highly parallel approach that enables decoding of polar and Reed-Muller (RM) codes with automorphisms, offering a practical solution with near-maximum likelihood (ML) performance and manageable computational complexity. To meet the growing demands for high throughput and low latency in cloud and virtual random access networks, this paper presents a graphics processing unit (GPU)-based AED architecture for polar and RM codes, utilizing low-complexity successive cancellation (SC) and small list SC (SCL) decoders as the constituent of AED. The proposed architecture exploits the inherent parallelism of AED to optimize decoding tasks on the GPU, significantly enhancing throughput by efficiently harnessing the massive parallel processing capabilities of the GPU. Additionally, improved thread mapping and data management techniques substantially reduce latency for automorphism ensemble SC (Aut-SC) decoding, while a low-latency sorting mechanism further accelerates automorphism ensemble SCL (Aut-SCL) decoding. Experimental results on an NVIDIA RTX 4090 demonstrate that the proposed Aut-SC decoder, with an ensemble size of 8, achieves a throughput exceeding 17 Gbps under highly parallelized batch processing. Compared to the state-of-the-art software-based SCL decoders, the proposed GPU-based Aut-SC and Aut-SCL architectures outperform existing solutions by factors of up to 28$\\boldsymbol{\\times}$ and 10$\\boldsymbol{\\times}$, respectively, in normalized throughput while maintaining the same or even superior error correction performance.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"2227-2242"},"PeriodicalIF":4.6000,"publicationDate":"2025-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11005721/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Automorphism ensemble decoding (AED) is a highly parallel approach that enables decoding of polar and Reed-Muller (RM) codes with automorphisms, offering a practical solution with near-maximum likelihood (ML) performance and manageable computational complexity. To meet the growing demands for high throughput and low latency in cloud and virtual random access networks, this paper presents a graphics processing unit (GPU)-based AED architecture for polar and RM codes, utilizing low-complexity successive cancellation (SC) and small list SC (SCL) decoders as the constituent of AED. The proposed architecture exploits the inherent parallelism of AED to optimize decoding tasks on the GPU, significantly enhancing throughput by efficiently harnessing the massive parallel processing capabilities of the GPU. Additionally, improved thread mapping and data management techniques substantially reduce latency for automorphism ensemble SC (Aut-SC) decoding, while a low-latency sorting mechanism further accelerates automorphism ensemble SCL (Aut-SCL) decoding. Experimental results on an NVIDIA RTX 4090 demonstrate that the proposed Aut-SC decoder, with an ensemble size of 8, achieves a throughput exceeding 17 Gbps under highly parallelized batch processing. Compared to the state-of-the-art software-based SCL decoders, the proposed GPU-based Aut-SC and Aut-SCL architectures outperform existing solutions by factors of up to 28$\boldsymbol{\times}$ and 10$\boldsymbol{\times}$, respectively, in normalized throughput while maintaining the same or even superior error correction performance.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.