Accuracy of MP3 speech recognition under real-word conditions: Experimental study

P. Pollák, Martin Behunek
{"title":"Accuracy of MP3 speech recognition under real-word conditions: Experimental study","authors":"P. Pollák, Martin Behunek","doi":"10.5220/0003512600050010","DOIUrl":null,"url":null,"abstract":"This paper presents the study of speech recognition accuracy with respect to different levels of MP3 compression. Special attention is focused on the processing of speech signals with different quality, i.e. with different level of background noise and channel distortion. The work was motivated by possible usage of ASR for offline automatic transcription of audio recordings collected by standard wide-spread MP3 devices. The realized experiments have proved that although MP3 format is not optimal for speech compression it does not distort speech significantly especially for high or moderate bit rates and high quality of source data. The accuracy of connected digits ASR decreased consequently very slowly up to the bit rate 24 kbps. For the best case of PLP parameterization in close-talk channel just 3% decrease of recognition accuracy was observed while the size of the compressed file was approximately 10% of the original size. All results were slightly worse under presence of additive background noise and channel distortion in a signal but achieved accuracy was also acceptable in this case especially for PLP features.","PeriodicalId":103791,"journal":{"name":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on Signal Processing and Multimedia Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0003512600050010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

This paper presents the study of speech recognition accuracy with respect to different levels of MP3 compression. Special attention is focused on the processing of speech signals with different quality, i.e. with different level of background noise and channel distortion. The work was motivated by possible usage of ASR for offline automatic transcription of audio recordings collected by standard wide-spread MP3 devices. The realized experiments have proved that although MP3 format is not optimal for speech compression it does not distort speech significantly especially for high or moderate bit rates and high quality of source data. The accuracy of connected digits ASR decreased consequently very slowly up to the bit rate 24 kbps. For the best case of PLP parameterization in close-talk channel just 3% decrease of recognition accuracy was observed while the size of the compressed file was approximately 10% of the original size. All results were slightly worse under presence of additive background noise and channel distortion in a signal but achieved accuracy was also acceptable in this case especially for PLP features.
真实世界条件下MP3语音识别的准确性实验研究
本文研究了不同MP3压缩水平下的语音识别精度。特别关注不同质量的语音信号的处理,即不同程度的背景噪声和信道失真。这项工作的动机是可能使用ASR对标准广泛使用的MP3设备收集的录音进行离线自动转录。已实现的实验证明,虽然MP3格式不是语音压缩的最佳格式,但对于高或中等比特率和高质量的源数据,它不会造成明显的语音失真。因此,连接数字ASR的精度下降非常缓慢,直至比特率为24kbps。当压缩文件的大小约为原始大小的10%时,PLP参数化在近距离通道中的最佳情况下,识别精度仅下降3%。在信号中存在附加背景噪声和通道失真的情况下,所有结果都稍差,但在这种情况下,特别是对于PLP特征,达到的精度也是可以接受的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信