{"title":"Lattice-Reduction-Aided Symbol-Wise Intra-Iterative Interference Cancellation Detector for Massive MIMO System","authors":"Hsiao-Yu Yeh, Yuan-Hao Huang","doi":"10.1109/SiPS47522.2019.9020430","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020430","url":null,"abstract":"Massive multiple-input multiple-output (MIMO) system plays an important role of increasing spectral efficiency in the fifth-generation (5G) cellular communication. The MIMO detection complexity increases significantly along with the number of antennas. Thus, the design of high-performance low-complexity detector for massive MIMO is a challenging design issue for the 5G system. This paper proposes a lattice-reduction-aided (LRA) symbol-wise (SW) detection technique to enhance the performance of the intra-iterative interference cancellation (IIC) detector based on Newton’s method. The proposed SW IIC detector has near minimum-mean-square-error performance with faster convergence speed and lower computational complexity than the original IIC detector. In a 64-QAM $128 times 8$ up-link MIMO system, the proposed LRA SW IIC detector reduces about 95.35% computational complexity of the original IIC detector under the same BER performance. Considering the preprocessing complexity of the LR in the time-varying channel, the proposed LRA SW IIC detector still has lower complexity when the coherent frame size is larger than 12 MIMO symbols.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128393962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haroon Waris, Chenghua Wang, Weiqiang Liu, F. Lombardi
{"title":"Design and Evaluation of a Power-Efficient Approximate Systolic Array Architecture for Matrix Multiplication","authors":"Haroon Waris, Chenghua Wang, Weiqiang Liu, F. Lombardi","doi":"10.1109/SiPS47522.2019.9020404","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020404","url":null,"abstract":"Matrix multiplication (MM) is a basic operation for many Digital Signal Processing applications. A Systolic Array (SA) is often considered as one of the most favorable architecture to achieve high performance for matrix multiplication. In this paper, the design exploration for an approximate SA is pursued; three design schemes are proposed by introducing approximation in multiple sub-modules. An approximation factor $alpha$ is introduced; it is related to the inexact columns in the SA to explore the accuracy-efficiency trade-off present in the proposed designs. In the evaluation, an 8-bit input operand matrix multiplication is considered; the Synopsys Design Compiler at 45nm technology node is used to establish hardware-related metrics. The Error Rate (ER), Normalized Mean Error Distance (NMED) and Mean Relative Error Distance (MRED) are used as figures of merit for error analysis. Results show that the proposed architecture for 8-bit matrix multiplication with an approximation factor $alpha=7$ has the lower power consumption compared to existing inexact designs found in the technical literature with comparable NMED. In addition, a power delay product vs NMED analysis shows the proposed designs have a lower PDP so applicable to low power applications. The practicality of the proposed architecture is established by computing the Discrete Cosine Transform.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129493152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AVX-512 Based Software Decoding for 5G LDPC Codes","authors":"Yi Xu, Wen Wang, Z. Xu, Xiqi Gao","doi":"10.1109/SiPS47522.2019.9020587","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020587","url":null,"abstract":"In this paper, we investigate how the 5G NR LDPC codes can be decoded by GPP effectively with single instruction-multiple-data (SIMD) acceleration and evaluate the corresponding achievable throughput on newly released Intel Xeon CPUs. Firstly, a general software implementation architecture with SIMD acceleration for horizontal-layered LDPC decoding is presented, where the parallelism can be achieved in an intra-block manner. By utilizing Intel advanced vector extended 512 (AVX-512) instruction set, the efficiency of parallelism are maximized and therefore the capacity of x86 processors can be fully exploited. In addition, new features of AVX-512 are further exploited to optimize load and store operations as well as preprocessing to reduce the operation cost. Experiments results also show that Intel Xeon Gold 6154 processors can achieve 42 to 272 Mbps throughput with a single core for ten layered decoding iterations for various code rate and block length. The typical processing latency is below 100 $mu s$. Consequently, an 18-core Intel Xeon CPU can achieve up to 5 Gbps decoding throughput.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"161 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115265136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Unified and Flexible Eigen-Solver for Rank-Deficient Matrix in MIMO Precoding/Beamforming Applications","authors":"Su-An Chou, A. E. Rakhmania, P. Tsai","doi":"10.1109/SiPS47522.2019.9020368","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020368","url":null,"abstract":"Eigenvalue decomposition (EVD) is a widely adopted technique to separate signal, interference, and noise subspaces. The paper presents a unified eigen-solver based on QR decomposition (QRD) to generate eigenpairs associated with the largest eigenvalues or zero eigenvalues, which are required in the MIMO hybrid beamforming systems that need interference suppression. A non-uniformly constrained deflation is proposed, which forces the matrix to deflate in the beginning and efficiently allocates the computation power to the eigenpairs related with the largest eigenvalues. The computation complexity of generating interested eigenpairs is also evaluated for various matrix dimensions. The results demonstrate that the non-uniformly constrained deflation is effective and more computations can be saved if the desired number of eigenpairs is smaller than the rank of the matrix.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126381815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Distributed Detection Algorithm For Uplink Massive MIMO Systems","authors":"Qiufeng Liu, Hao Liu, Ying Yan, Peng Wu","doi":"10.1109/SiPS47522.2019.9020489","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020489","url":null,"abstract":"Massive multiple-input multiple-output (MIMO) uplink detection algorithms usually rely on centralized base station (BS) architecture, which results in excessive amount of raw baseband data to be transmitted to central processing unit (CU) when the number of antennas is large. Considering the channel hardening characteristics occurs in massive MIMO channels, this paper develops a novel distributed algorithm based on a daisy chain architecture, where the BS antennas are divided into clusters and each owns independent computing hardware for signal processing. In distributed signal detection, only local channel state information (CSI), received data and some data exchange between clusters are needed on each cluster. It is demonstrated that the algorithm can achieve the tradeoff between complexity and performance better than other existing distributed methods.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129585443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An ISAR Imaging Algorithm Based on RCA for Micro-Doppler Effect Suppression","authors":"Xinbo Xu, Xinfei Jin, Fulin Su","doi":"10.1109/SiPS47522.2019.9020383","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020383","url":null,"abstract":"In Inverse Synthetic Aperture Radar (ISAR) imaging, the micro-Doppler (m-D) effect caused by micro-motion parts of the target will not only make parameter extraction and motion compensation difficult but also cause image defocusing. It will appear as azimuth interference sidebands and decrease image quality seriously. Therefore, studying the micro-Doppler suppression problem in practical applications is of great importance in high-quality imaging of ISAR. In this paper, a reasonable and effective mathematical model is established, and the m-D suppression algorithm inspired by the robust principal component analysis (RPCA) matrix reconstruction theory is proposed. Our algorithm transforms the problem of separating radar echoes into the decomposition of a low rank rotating components m-D signal matrix and a sparse main body ISAR image signal matrix. Moreover, experimental results based on simulated and real measured data are utilized to verify the effectiveness of our method.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126853050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyuan Lu, Jinming Lu, Jun Lin, Zhongfeng Wang, L. Du
{"title":"A Low-Latency and Low-Complexity Hardware Architecture for CTC Beam Search Decoding","authors":"Siyuan Lu, Jinming Lu, Jun Lin, Zhongfeng Wang, L. Du","doi":"10.1109/SiPS47522.2019.9020324","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020324","url":null,"abstract":"The recurrent neural networks (RNNs) along with connectionist temporal classification (CTC) have been widely used in many sequence to sequence tasks, including automatic speech recognition (ASR), lipreading, and scene text recognition (STR). In these systems, CTC-trained RNNs usually require specific CTC-decoders after their output layers. Many existing CTC-trained RNN inference systems use FPGA to do calculations of RNNs, and decode their outputs on CPU. However, with the development of FPGA-based RNN hardware accelerators, existing CPU-based CTC-decoder can not meet the latency requirement of them. To resolve this issue, this paper proposes an efficient hardware architecture for the CTC beam search decoder based on the decoding method reported in our previous work. The experimental results show that the system latency per sample of the CTC-decoder is only 7.19us on Xilinx xc7vx1140tflg19301 FPGA platform, which is lower than state-of-the-art RNNs. We also implement the origin algorithm on the same FPGA platform. Comparison results show that the improved one reduces the system latency per sample by 63.67%, the LUTRAMs by 97.44%, the FFs by 79.55%, and the DSPs by 50%. To the best of our knowledge, this is the first work on hardware implementation for CTC beam search decoder.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128939027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"[Copyright notice]","authors":"","doi":"10.1109/sips47522.2019.9020396","DOIUrl":"https://doi.org/10.1109/sips47522.2019.9020396","url":null,"abstract":"","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131617679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A New Inversionless Berlekamp-Massey Algorithm with Efficient Architecture","authors":"Chao Chen, Y. Han, Zhongfeng Wang, B. Bai","doi":"10.1109/SiPS47522.2019.9020488","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020488","url":null,"abstract":"This paper presents a new inversionless Berlekamp-Massey (BM) algorithm as well as its efficient architecture. Starting with a lesser-known version of BM algorithm, we develop a serial of inversionless variants by successively applying algorithmic transformations. The final algorithm has a very compact description and a highly regular structure, which can be naturally mapped to a systolic architecture. Compared with the state-of-the-art architecture RiBM, the proposed one possesses a different cell structure and has slightly lower hardware requirements. More importantly, it enables us to establish a new architectural equivalence between the BM algorithm and the Euclidean algorithm.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131682738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dongming Ren, Kang Chen, Shengheng Liu, Yongming Huang
{"title":"FPGA Prototyping of A Millimeter-Wave Multiple Gigabit WLAN System","authors":"Dongming Ren, Kang Chen, Shengheng Liu, Yongming Huang","doi":"10.1109/SiPS47522.2019.9020634","DOIUrl":"https://doi.org/10.1109/SiPS47522.2019.9020634","url":null,"abstract":"IEEE 802.11aj (45-GHz) standard is recently proposed for wireless local area network operating in an undefined millimeter-wave (mmWave) band. In this work, an ultra-high-speed mmWave orthogonal frequency division multiplexing transmission prototype is developed and some primary amendments in this standard are verified using NI-PXIe mmWave softwaredefined-radio platform. A mixed parallel processing scheme is devised to meet the clock requirements of field programmable gate arrays baseband processing. A queue-based synchronization mechanism is designed to facilitate the implementation of data transporting. Data transmission test indicates that the system is able to achieve an extremely high data rate of multi-gigabits per second with a low bit error rate.","PeriodicalId":256971,"journal":{"name":"2019 IEEE International Workshop on Signal Processing Systems (SiPS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133917498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}