{"title":"Real-Time Estimation of Speech Quality through the Internet Using Echo State Networks","authors":"Sebastián Basterrech, G. Rubino","doi":"10.7763/JACN.2013.V1.37","DOIUrl":null,"url":null,"abstract":"Audio quality in the Internet can be strongly affected by network conditions. As a consequence, many techniques to evaluate it have been developed. In particular, the ITU-T adopted in 2001 a technique called Perceptual Evaluation of Speech Quality (PESQ) to automatically measuring speech quality. PESQ is a well-known and widely used procedure, providing in general an accurate evaluation of perceptual quality by comparing the original and received voice sequences. One obvious inherent limitation of PESQ is, thus, that it requires the original signal (we say the reference), to make its evaluation. This precludes the use of PESQ for assessing the perceived quality in real-time, as the reference is in general not available. In this paper, we describe a procedure for estimating PESQ output working only with measures taken on the network state and properties of the communication system, without any use of the reference. It is based on the use of statistical learning techniques. Specifically, we rely on recent ideas for learning with specific types of neural networks, known under the name of Echo State Networks (ESNs), a member of the class of Reservoir Computing systems. These tools have been proven to be very efficient and robust in many learning tasks. The experimental results obtained show the good accuracy of the resulting procedure, and its capability to give its estimations of speech quality in a real-time context. This allows putting our measuring modules in future Internet applications or services based on voice transmission, for instance for control purposes.","PeriodicalId":232851,"journal":{"name":"Journal of Advances in Computer Networks","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advances in Computer Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7763/JACN.2013.V1.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Audio quality in the Internet can be strongly affected by network conditions. As a consequence, many techniques to evaluate it have been developed. In particular, the ITU-T adopted in 2001 a technique called Perceptual Evaluation of Speech Quality (PESQ) to automatically measuring speech quality. PESQ is a well-known and widely used procedure, providing in general an accurate evaluation of perceptual quality by comparing the original and received voice sequences. One obvious inherent limitation of PESQ is, thus, that it requires the original signal (we say the reference), to make its evaluation. This precludes the use of PESQ for assessing the perceived quality in real-time, as the reference is in general not available. In this paper, we describe a procedure for estimating PESQ output working only with measures taken on the network state and properties of the communication system, without any use of the reference. It is based on the use of statistical learning techniques. Specifically, we rely on recent ideas for learning with specific types of neural networks, known under the name of Echo State Networks (ESNs), a member of the class of Reservoir Computing systems. These tools have been proven to be very efficient and robust in many learning tasks. The experimental results obtained show the good accuracy of the resulting procedure, and its capability to give its estimations of speech quality in a real-time context. This allows putting our measuring modules in future Internet applications or services based on voice transmission, for instance for control purposes.
互联网上的音频质量会受到网络条件的强烈影响。因此,开发了许多评估它的技术。特别是,ITU-T在2001年采用了语音质量感知评价(PESQ)技术来自动测量语音质量。PESQ是一个众所周知且广泛使用的程序,通常通过比较原始和接收的语音序列来提供对感知质量的准确评估。因此,PESQ的一个明显的固有限制是,它需要原始信号(我们称之为参考信号)来进行评估。这就排除了使用PESQ实时评估感知质量的可能性,因为通常无法获得参考资料。在本文中,我们描述了一个估计PESQ输出的过程,该过程仅使用对通信系统的网络状态和属性采取的措施,而不使用任何参考。它基于统计学习技术的使用。具体来说,我们依赖于使用特定类型的神经网络进行学习的最新想法,这些神经网络被称为回声状态网络(Echo State networks, ESNs),是水库计算系统的一员。这些工具已被证明在许多学习任务中非常有效和健壮。实验结果表明,该方法具有良好的准确性,能够在实时环境下对语音质量进行估计。这使得我们的测量模块可以用于未来基于语音传输的互联网应用或服务,例如用于控制目的。