Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks

IF 1.3 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Mariem Bouafif Mansali, Pablo Pérez Zarazaga, Tom Bäckström, Z. Lachiri
{"title":"Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks","authors":"Mariem Bouafif Mansali, Pablo Pérez Zarazaga, Tom Bäckström, Z. Lachiri","doi":"10.3389/frsip.2022.800003","DOIUrl":null,"url":null,"abstract":"The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.","PeriodicalId":93557,"journal":{"name":"Frontiers in signal processing","volume":"31 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in signal processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frsip.2022.800003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

The use of speech source localization (SSL) and its applications offer great possibilities for the design of speaker local positioning systems with wireless acoustic sensor networks (WASNs). Recent works have shown that data-driven front-ends can outperform traditional algorithms for SSL when trained to work in specific domains, depending on factors like reverberation and noise levels. However, such localization models consider localization directly from raw sensor observations, without consideration for transmission losses in WASNs. In contrast, when sensors reside in separate real-life devices, we need to quantize, encode and transmit sensor data, decreasing the performance of localization, especially when the transmission bitrate is low. In this work, we investigate the effect of low bitrate transmission on a Direction of Arrival (DoA) estimator. We analyze a deep neural network (DNN) based framework performance as a function of the audio encoding bitrate for compressed signals by employing recent communication codecs including PyAWNeS, Opus, EVS, and Lyra. Experimental results show that training the DNN on input encoded with the PyAWNeS codec at 16.4 kB/s can improve the accuracy significantly, and up to 50% of accuracy degradation at a low bitrate for almost all codecs can be recovered. Our results further show that for the best accuracy of the trained model when one of the two channels can be encoded with a bitrate higher than 32 kB/s, it is optimal to have the raw data for the second channel. However, for a lower bitrate, it is preferable to similarly encode the two channels. More importantly, for practical applications, a more generalized model trained with a randomly selected codec for each channel, shows a large accuracy gain when at least one of the two channels is encoded with PyAWNeS.
无线声学传感器网络低比特率语音定位
语音源定位技术及其应用为无线声传感器网络扬声器局部定位系统的设计提供了极大的可能性。最近的研究表明,根据混响和噪声水平等因素,在特定领域进行训练时,数据驱动的前端可以胜过SSL的传统算法。然而,这种定位模型直接从原始传感器观测中考虑定位,而不考虑wasn中的传输损失。相反,当传感器位于独立的现实设备中时,我们需要量化、编码和传输传感器数据,这降低了定位的性能,特别是当传输比特率很低时。在这项工作中,我们研究了低比特率传输对到达方向(DoA)估计器的影响。我们通过使用最新的通信编解码器(包括PyAWNeS, Opus, EVS和Lyra),分析了基于深度神经网络(DNN)的框架性能作为压缩信号音频编码比特率的函数。实验结果表明,在PyAWNeS编解码器编码的输入上以16.4 kB/s的速度训练DNN可以显著提高准确率,并且几乎所有编解码器在低比特率下都可以恢复高达50%的准确率下降。我们的结果进一步表明,当两个通道中的一个可以以高于32 kB/s的比特率进行编码时,为了获得训练模型的最佳精度,最佳方法是使用第二个通道的原始数据。然而,对于较低的比特率,最好对两个通道进行类似的编码。更重要的是,对于实际应用,使用随机选择的编解码器对每个通道进行训练的更广义的模型显示,当两个通道中至少有一个使用PyAWNeS进行编码时,精度增益很大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信