为10khz带宽语音设计的可扩展编码器

M. Oshikiri, H. Ehara, K. Yoshida
{"title":"为10khz带宽语音设计的可扩展编码器","authors":"M. Oshikiri, H. Ehara, K. Yoshida","doi":"10.1109/SCW.2002.1215741","DOIUrl":null,"url":null,"abstract":"This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.","PeriodicalId":140750,"journal":{"name":"Speech Coding, 2002, IEEE Workshop Proceedings.","volume":"229 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A scalable coder designed for 10-kHz bandwidth speech\",\"authors\":\"M. Oshikiri, H. Ehara, K. Yoshida\",\"doi\":\"10.1109/SCW.2002.1215741\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.\",\"PeriodicalId\":140750,\"journal\":{\"name\":\"Speech Coding, 2002, IEEE Workshop Proceedings.\",\"volume\":\"229 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Speech Coding, 2002, IEEE Workshop Proceedings.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SCW.2002.1215741\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Speech Coding, 2002, IEEE Workshop Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCW.2002.1215741","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

本文提出了一种速率为23.85 kbit/s的可扩展语音编码器,用于编码带宽为10khz的语音信号。10 khz带宽语音信号的感知质量远好于7 khz带宽语音信号,与20 khz带宽语音信号的感知质量接近。因此,10khz带宽对于高保真会话应用是有希望的。可扩展编码器由两层组成:基础层和增强层。在基本层和增强层分别采用15.85 kbit/s的自适应多速率宽带语音编码器(AMR-WB)和8kbit /s的变换编码方法。这种混合结构保证了10khz带宽语音的高效编码。在增强层,利用改进的离散余弦变换(MDCT)。它的分析帧大小被设置为短,以尽量减少额外的算法延迟。增强层的总额外算法延迟为5ms。由于很难在8 kbit/s下量化所有MDCT系数,我们将量化区域从6 khz限制到9 khz,以提高解码语音的感知质量。我们的主观评价测试结果表明,在清洁和噪声条件下,所提出的编码器的质量明显超过23.85 kbit/s的AMR-WB。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A scalable coder designed for 10-kHz bandwidth speech
This paper presents a scalable speech coder with rate of 23.85-kbit/s to encode 10-kHz bandwidth speech signals. The perceptual quality of the 10-kHz bandwidth speech signals is much better than that of 7-kHz bandwidth ones, and it is close to that of 20-kHz bandwidth ones. The 10-kHz bandwidth is therefore promising for high-fidelity conversational applications. The scalable coder consists of two layers: a base-layer and an enhancement-layer. The adaptive multi-rate wideband speech coder (AMR-WB) at 15.85-kbit/s and a transform coding method at 8-kbit/s are utilized for the base-layer and the enhancement-layer, respectively. This hybrid structure ensures the efficient coding of the 10-kHz bandwidth speech. In enhancement-layer, the modified discrete cosine transform (MDCT) is exploited. Its analysis frame size is set to be short in order to minimize additional algorithmic delay. The total additional algorithmic delay of the enhancement-layer is 5-ms. Since it is difficult to quantize all the MDCT coefficients at 8-kbit/s, we have limited the region for quantization from 6-kHz to 9-kHz to improve the perceptual quality of decoded speech. Our subjective evaluation test results indicate the quality of the proposed coder clearly exceeds that of AMR-WB at 23.85-kbit/s under both clean and noise conditions.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信