{"title":"Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding","authors":"Fei Liu;Yang Ai;Zhen-Hua Ling","doi":"10.1109/LSP.2025.3596826","DOIUrl":null,"url":null,"abstract":"Low-bitrate speech coding plays an essential role in speech transmission and storage. However, speech quality degrades noticeably at low bitrates with current coding methods. Therefore, this letter proposes a novel Token-Prediction-based Post-Processing (T3P) model to improve the quality of low-bitrate coded speech. Unlike existing post-processing methods, T3P is a discrete-domain method centered on the prediction and classification of discrete tokens. Specifically, given low-bitrate coded speech features as condition, T3P initiates from a random token and sequentially predicts the token sequences produced by a residual vector quantization (RVQ) based neural codec, which is subsequently decoded to reconstruct the raw speech. Experiments confirm that T3P surpasses flow-matching-based and speech-enhancement-based baselines, achieving a better trade-off between speech quality and efficiency. Empowered by T3P, Encodec achieves performance at just 0.5 kbps that exceeds its original 4 kbps results for 16 kHz speech coding.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3235-3239"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11119413/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Low-bitrate speech coding plays an essential role in speech transmission and storage. However, speech quality degrades noticeably at low bitrates with current coding methods. Therefore, this letter proposes a novel Token-Prediction-based Post-Processing (T3P) model to improve the quality of low-bitrate coded speech. Unlike existing post-processing methods, T3P is a discrete-domain method centered on the prediction and classification of discrete tokens. Specifically, given low-bitrate coded speech features as condition, T3P initiates from a random token and sequentially predicts the token sequences produced by a residual vector quantization (RVQ) based neural codec, which is subsequently decoded to reconstruct the raw speech. Experiments confirm that T3P surpasses flow-matching-based and speech-enhancement-based baselines, achieving a better trade-off between speech quality and efficiency. Empowered by T3P, Encodec achieves performance at just 0.5 kbps that exceeds its original 4 kbps results for 16 kHz speech coding.
期刊介绍:
The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.