Matching pursuits sinusoidal speech coding

IEEE Trans. Speech Audio Process. Pub Date : 2003-08-26 DOI:10.1109/TSA.2003.815520

Ç. Etemoglu, V. Cuperman

{"title":"Matching pursuits sinusoidal speech coding","authors":"Ç. Etemoglu, V. Cuperman","doi":"10.1109/TSA.2003.815520","DOIUrl":null,"url":null,"abstract":"This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"6 1","pages":"413-424"},"PeriodicalIF":0.0000,"publicationDate":"2003-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2003.815520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

This paper introduces a sinusoidal modeling technique for low bit rate speech coding wherein the parameters for each sinusoidal component are sequentially extracted by a closed-loop analysis. The sinusoidal modeling of the speech linear prediction (LP) residual is performed within the general framework of matching pursuits with a dictionary of sinusoids. The frequency space of sinusoids is restricted to sets of frequency intervals or bins, which in conjunction with the closed-loop analysis allow us to map the frequencies of the sinusoids into a frequency vector that is efficiently quantized. In voiced frames, two sets of frequency vectors are generated: one of them represents harmonically related and the other one nonharmonically related components of the voiced segment. This approach eliminates the need for voicing dependent cutoff frequency that is difficult to estimate correctly and to quantize at low bit rates. In transition frames, to efficiently extract and quantize the set of frequencies needed for the sinusoidal representation of the LP residual, we introduce frequency bin vector quantization (FBVQ). FBVQ selects a vector of nonuniformly spaced frequencies from a frequency codebook in order to represent the frequency domain information in transition regions. Our use of FBVQ with closed-loop searching contribute to an improvement of speech quality in transition frames. The effectiveness of the coding scheme is enhanced by exploiting the critical band concept of auditory perception in defining the frequency bins. To demonstrate the viability and the advantages of the new models studied, we designed a 4 kbps matching pursuits sinusoidal speech coder. Subjective results indicate that the proposed coder at 4 kbps has quality exceeding the 6.3 kbps G.723.1 coder.

查看原文本刊更多论文

匹配追求正弦语音编码

本文介绍了一种用于低比特率语音编码的正弦建模技术，其中每个正弦分量的参数通过闭环分析顺序提取。语音线性预测(LP)残差的正弦建模是在用正弦波字典匹配追踪的一般框架内进行的。正弦波的频率空间被限制为一组频率间隔或箱，这与闭环分析相结合，使我们能够将正弦波的频率映射为有效量化的频率向量。在浊音帧中，产生两组频率向量，其中一组表示浊音段的谐波相关分量，另一组表示非谐波相关分量。这种方法消除了难以正确估计和在低比特率下量化的语音相关截止频率的需要。在过渡帧中，为了有效地提取和量化低频残差正弦表示所需的频率集，我们引入了频率本向量量化(FBVQ)。FBVQ从频率码本中选择一个频率间隔不均匀的向量来表示过渡区域的频域信息。我们将FBVQ与闭环搜索相结合，有助于提高过渡帧的语音质量。利用听觉感知的临界频带概念来定义频率箱，提高了编码方案的有效性。为了证明新模型的可行性和优势，我们设计了一个4kbps匹配追踪正弦语音编码器。主观测试结果表明，4kbps编码器的质量优于6.3 kbps的G.723.1编码器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Trans. Speech Audio Process.

自引率

0.00%

发文量