Daniel T. Jones, D. Sharma, S. Kruchinin, P. Naylor
{"title":"Spatial Coding for Microphone Arrays Using Ipnlms-Based RTF Estimation","authors":"Daniel T. Jones, D. Sharma, S. Kruchinin, P. Naylor","doi":"10.1109/WASPAA52581.2021.9632747","DOIUrl":null,"url":null,"abstract":"We propose a method for encoding multichannel microphone array signals and show that our proposed algorithm can operate effectively at very low bitrates. Our approach leverages the high interchannel correlations that arise from the close proximity of microphones in an array to compactly represent the signals. An $M$ channel microphone array signal is encoded as one reference signal and $M-1$ Relative Transfer Functions (RTFs). When the RTFs require updating only infrequently, a significant reduction in data-rate is obtained. Applications of interest include cloud-based beamforming and End-to-End Automatic Speech Recognition (ASR) systems. The efficiency of this encoding enables multichannel audio to be transmitted to the cloud at very low bitrates. A system has been developed that estimates, and periodically updates, the RTFs between each channel of the array and a chosen reference channel using an Improved Proportionate Normalized Least Mean Squares (IPNLMS) adaptive filter. The proposed system is experimentally evaluated in comparison with the Opus codec. It achieves equal ΔPESQ performance with a data-rate reduction of up to 90% and un-degraded Word Error Rate (WER) down to bitrates as low as 3.3 kbps.","PeriodicalId":429900,"journal":{"name":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WASPAA52581.2021.9632747","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
We propose a method for encoding multichannel microphone array signals and show that our proposed algorithm can operate effectively at very low bitrates. Our approach leverages the high interchannel correlations that arise from the close proximity of microphones in an array to compactly represent the signals. An $M$ channel microphone array signal is encoded as one reference signal and $M-1$ Relative Transfer Functions (RTFs). When the RTFs require updating only infrequently, a significant reduction in data-rate is obtained. Applications of interest include cloud-based beamforming and End-to-End Automatic Speech Recognition (ASR) systems. The efficiency of this encoding enables multichannel audio to be transmitted to the cloud at very low bitrates. A system has been developed that estimates, and periodically updates, the RTFs between each channel of the array and a chosen reference channel using an Improved Proportionate Normalized Least Mean Squares (IPNLMS) adaptive filter. The proposed system is experimentally evaluated in comparison with the Opus codec. It achieves equal ΔPESQ performance with a data-rate reduction of up to 90% and un-degraded Word Error Rate (WER) down to bitrates as low as 3.3 kbps.