Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding

IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Fei Liu;Yang Ai;Zhen-Hua Ling
{"title":"Token-Prediction-Based Post-Processing for Low-Bitrate Speech Coding","authors":"Fei Liu;Yang Ai;Zhen-Hua Ling","doi":"10.1109/LSP.2025.3596826","DOIUrl":null,"url":null,"abstract":"Low-bitrate speech coding plays an essential role in speech transmission and storage. However, speech quality degrades noticeably at low bitrates with current coding methods. Therefore, this letter proposes a novel Token-Prediction-based Post-Processing (T3P) model to improve the quality of low-bitrate coded speech. Unlike existing post-processing methods, T3P is a discrete-domain method centered on the prediction and classification of discrete tokens. Specifically, given low-bitrate coded speech features as condition, T3P initiates from a random token and sequentially predicts the token sequences produced by a residual vector quantization (RVQ) based neural codec, which is subsequently decoded to reconstruct the raw speech. Experiments confirm that T3P surpasses flow-matching-based and speech-enhancement-based baselines, achieving a better trade-off between speech quality and efficiency. Empowered by T3P, Encodec achieves performance at just 0.5 kbps that exceeds its original 4 kbps results for 16 kHz speech coding.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"3235-3239"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Signal Processing Letters","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/11119413/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Low-bitrate speech coding plays an essential role in speech transmission and storage. However, speech quality degrades noticeably at low bitrates with current coding methods. Therefore, this letter proposes a novel Token-Prediction-based Post-Processing (T3P) model to improve the quality of low-bitrate coded speech. Unlike existing post-processing methods, T3P is a discrete-domain method centered on the prediction and classification of discrete tokens. Specifically, given low-bitrate coded speech features as condition, T3P initiates from a random token and sequentially predicts the token sequences produced by a residual vector quantization (RVQ) based neural codec, which is subsequently decoded to reconstruct the raw speech. Experiments confirm that T3P surpasses flow-matching-based and speech-enhancement-based baselines, achieving a better trade-off between speech quality and efficiency. Empowered by T3P, Encodec achieves performance at just 0.5 kbps that exceeds its original 4 kbps results for 16 kHz speech coding.
基于标记预测的低比特率语音编码后处理
低比特率语音编码在语音传输和存储中起着至关重要的作用。然而,在当前的编码方法下,低比特率下的语音质量明显下降。因此,本文提出了一种新的基于标记预测的后处理(T3P)模型,以提高低比特率编码语音的质量。与现有的后处理方法不同,T3P是一种以离散令牌的预测和分类为中心的离散域方法。具体而言,在给定低比特率编码语音特征的条件下,T3P从随机标记开始,并依次预测基于残差矢量量化(RVQ)的神经编解码器产生的标记序列,随后对其进行解码以重建原始语音。实验证实,T3P超越了基于流匹配和基于语音增强的基线,在语音质量和效率之间实现了更好的权衡。在T3P的支持下,Encodec实现的性能仅为0.5 kbps,超过了16 kHz语音编码的原始4 kbps结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Signal Processing Letters
IEEE Signal Processing Letters 工程技术-工程:电子与电气
CiteScore
7.40
自引率
12.80%
发文量
339
审稿时长
2.8 months
期刊介绍: The IEEE Signal Processing Letters is a monthly, archival publication designed to provide rapid dissemination of original, cutting-edge ideas and timely, significant contributions in signal, image, speech, language and audio processing. Papers published in the Letters can be presented within one year of their appearance in signal processing conferences such as ICASSP, GlobalSIP and ICIP, and also in several workshop organized by the Signal Processing Society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信