Lossless Parallel Implementation of a Turbo Decoder on GPU

2018 IEEE 25th International Conference on High Performance Computing (HiPC) Pub Date : 2018-12-01 DOI:10.1109/HiPC.2018.00023

K. Natarajan, N. Chandrachoodan

引用次数: 3

Abstract

Turbo decoders use the recursive BCJR algorithm which is computationally intensive and hard to parallelise. The branch metric and extrinsic log-likelihood ratio computations are easily parallelisable, but the forward and backward metric computation is not parallelisable without compromising bit error rate. This paper proposes a lossless parallelisation technique for Turbo decoders on Graphics Processing Units (GPU). The recursive forward and backward metric computation is formulated as prefix (scan) matrix multiplication problem which is computed on the GPU using parallel prefix sum computation technique. Overall, this method achieves a throughput of 73 Mbps for a 3GPP LTE compliant turbo decoder without any BER loss and latency as low as 61 μs.

查看原文本刊更多论文

Turbo解码器在GPU上的无损并行实现

Turbo解码器使用递归BCJR算法，该算法计算量大，难以并行化。分支度量和外在对数似然比计算容易并行化，但在不影响误码率的情况下，前向和后向度量计算不能并行化。提出了一种用于图形处理器(GPU)上Turbo解码器的无损并行化技术。将递归前向和后向度量计算表述为前缀(扫描)矩阵乘法问题，利用并行前缀和计算技术在GPU上进行计算。总体而言，该方法实现了符合3GPP LTE标准的涡轮解码器的73 Mbps吞吐量，没有任何误码率损失，延迟低至61 μs。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 25th International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量