语音质量指标对背景噪声和网络退化的鲁棒性:比较ViSQOL、PESQ和POLQA

2013 IEEE International Conference on Acoustics, Speech and Signal Processing Pub Date : 2013-05-26 DOI:10.1109/ICASSP.2013.6638348

Andrew Hines, J. Skoglund, A. Kokaram, N. Harte

{"title":"语音质量指标对背景噪声和网络退化的鲁棒性:比较ViSQOL、PESQ和POLQA","authors":"Andrew Hines, J. Skoglund, A. Kokaram, N. Harte","doi":"10.1109/ICASSP.2013.6638348","DOIUrl":null,"url":null,"abstract":"The Virtual Speech Quality Objective Listener (ViSQOL) is a new objective speech quality model. It is a signal based full reference metric that uses a spectro-temporal measure of similarity between a reference and a test speech signal. ViSQOL aims to predict the overall quality of experience for the end listener whether the cause of speech quality degradation is due to ambient noise, or transmission channel degradations. This paper describes the algorithm and tests the model using two speech corpora: NOIZEUS and E4. The NOIZEUS corpus contains speech under a variety of background noise types, speech enhancement methods, and SNR levels. The E4 corpus contains voice over IP degradations including packet loss, jitter and clock drift. The results are compared with the ITU-T objective models for speech quality: PESQ and POLQA. The behaviour of the metrics are also evaluated under simulated time warp conditions. The results show that for both datasets ViSQOL performed comparably with PESQ. POLQA was shown to have lower correlation with subjective scores than the other metrics for the NOIZEUS database.","PeriodicalId":183968,"journal":{"name":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA\",\"authors\":\"Andrew Hines, J. Skoglund, A. Kokaram, N. Harte\",\"doi\":\"10.1109/ICASSP.2013.6638348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Virtual Speech Quality Objective Listener (ViSQOL) is a new objective speech quality model. It is a signal based full reference metric that uses a spectro-temporal measure of similarity between a reference and a test speech signal. ViSQOL aims to predict the overall quality of experience for the end listener whether the cause of speech quality degradation is due to ambient noise, or transmission channel degradations. This paper describes the algorithm and tests the model using two speech corpora: NOIZEUS and E4. The NOIZEUS corpus contains speech under a variety of background noise types, speech enhancement methods, and SNR levels. The E4 corpus contains voice over IP degradations including packet loss, jitter and clock drift. The results are compared with the ITU-T objective models for speech quality: PESQ and POLQA. The behaviour of the metrics are also evaluated under simulated time warp conditions. The results show that for both datasets ViSQOL performed comparably with PESQ. POLQA was shown to have lower correlation with subjective scores than the other metrics for the NOIZEUS database.\",\"PeriodicalId\":183968,\"journal\":{\"name\":\"2013 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference on Acoustics, Speech and Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2013.6638348\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Acoustics, Speech and Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2013.6638348","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

摘要

虚拟语音质量客观听者(ViSQOL)是一种新的客观语音质量模型。它是一种基于信号的全参考度量，使用参考和测试语音信号之间的相似度的光谱-时间度量。ViSQOL旨在预测最终听者的整体体验质量，无论语音质量下降的原因是由于环境噪声还是由于传输信道的退化。本文描述了该算法，并使用NOIZEUS和E4两个语音语料库对模型进行了测试。NOIZEUS语料库包含各种背景噪声类型、语音增强方法和信噪比水平下的语音。E4语料库包含IP语音降级，包括丢包、抖动和时钟漂移。结果与ITU-T的语音质量目标模型PESQ和POLQA进行了比较。在模拟的时间扭曲条件下，还评估了指标的行为。结果表明，在这两个数据集上，ViSQOL与PESQ的表现相当。与NOIZEUS数据库的其他指标相比，POLQA与主观评分的相关性较低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robustness of speech quality metrics to background noise and network degradations: Comparing ViSQOL, PESQ and POLQA

The Virtual Speech Quality Objective Listener (ViSQOL) is a new objective speech quality model. It is a signal based full reference metric that uses a spectro-temporal measure of similarity between a reference and a test speech signal. ViSQOL aims to predict the overall quality of experience for the end listener whether the cause of speech quality degradation is due to ambient noise, or transmission channel degradations. This paper describes the algorithm and tests the model using two speech corpora: NOIZEUS and E4. The NOIZEUS corpus contains speech under a variety of background noise types, speech enhancement methods, and SNR levels. The E4 corpus contains voice over IP degradations including packet loss, jitter and clock drift. The results are compared with the ITU-T objective models for speech quality: PESQ and POLQA. The behaviour of the metrics are also evaluated under simulated time warp conditions. The results show that for both datasets ViSQOL performed comparably with PESQ. POLQA was shown to have lower correlation with subjective scores than the other metrics for the NOIZEUS database.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE International Conference on Acoustics, Speech and Signal Processing

自引率

0.00%

发文量