为10khz带宽语音设计的可扩展编码器

Speech Coding, 2002, IEEE Workshop Proceedings. Pub Date : 2002-10-06 DOI:10.1109/SCW.2002.1215741

M. Oshikiri, H. Ehara, K. Yoshida

{"title":"为10khz带宽语音设计的可扩展编码器","authors":"M. Oshikiri, H. Ehara, K. Yoshida","doi":"10.1109/SCW.2002.1215741","DOIUrl":null,"url":null,"abstract":"This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"229 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A scalable coder designed for 10-kHz bandwidth speech\",\"authors\":\"M. Oshikiri, H. Ehara, K. Yoshida\",\"doi\":\"10.1109/SCW.2002.1215741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.\",\"PeriodicalId\":140750,\"journal\":{\"name\":\"Speech Coding, 2002, IEEE Workshop Proceedings.\",\"volume\":\"229 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Coding, 2002, IEEE Workshop Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCW.2002.1215741\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Coding, 2002, IEEE Workshop Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCW.2002.1215741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

本文提出了一种速率为23.85 kbit/s的可扩展语音编码器，用于编码带宽为10khz的语音信号。10 khz带宽语音信号的感知质量远好于7 khz带宽语音信号，与20 khz带宽语音信号的感知质量接近。因此，10khz带宽对于高保真会话应用是有希望的。可扩展编码器由两层组成:基础层和增强层。在基本层和增强层分别采用15.85 kbit/s的自适应多速率宽带语音编码器(AMR-WB)和8kbit /s的变换编码方法。这种混合结构保证了10khz带宽语音的高效编码。在增强层，利用改进的离散余弦变换(MDCT)。它的分析帧大小被设置为短，以尽量减少额外的算法延迟。增强层的总额外算法延迟为5ms。由于很难在8 kbit/s下量化所有MDCT系数，我们将量化区域从6 khz限制到9 khz，以提高解码语音的感知质量。我们的主观评价测试结果表明，在清洁和噪声条件下，所提出的编码器的质量明显超过23.85 kbit/s的AMR-WB。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A scalable coder designed for 10-kHz bandwidth speech

This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Speech Coding, 2002, IEEE Workshop Proceedings.

自引率

0.00%

发文量