基于深度学习的到达时差估计——从声学模拟到记录数据

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP) Pub Date : 2020-09-21 DOI:10.1109/MMSP48831.2020.9287131

Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske

{"title":"基于深度学习的到达时差估计——从声学模拟到记录数据","authors":"Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske","doi":"10.1109/MMSP48831.2020.9287131","DOIUrl":null,"url":null,"abstract":"The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.","PeriodicalId":188283,"journal":{"name":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data\",\"authors\":\"Pasi Pertilä, Mikko Parviainen, V. Myllylä, A. Huttunen, P. Jarske\",\"doi\":\"10.1109/MMSP48831.2020.9287131\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.\",\"PeriodicalId\":188283,\"journal\":{\"name\":\"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP48831.2020.9287131\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP48831.2020.9287131","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

声源的空间信息由声波传递到传声器阵列，通过估计传声器之间的相位和幅度差来观测。到达时间差(TDoA)捕获麦克风之间波前的传播延迟，可用于引导波束形成器或定位源。但是混响和干扰会使TDoA估计变差。与传统的基于相关的方法相比，深度神经网络通过监督学习可以在更恶劣的条件下提取语音相关的tdoa。声学模拟提供了大量带有注释的数据，而真实记录需要手动注释或使用带有适当校准程序的参考传感器。这两个数据源的分布可能不同。当使用模拟数据训练的DNN模型与来自不同分布的真实数据呈现时，如果不适当处理，其性能会下降。为了降低基于深度神经网络的TDoA估计误差，本研究探讨了不同输入归一化技术的作用，混合模拟和真实数据进行训练，并应用对抗域自适应技术。结果量化了使用不同方法对真实数据的TDoA误差的减少。在训练过程中使用归一化方法、领域自适应和真实数据可以明显降低TDoA误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Time Difference of Arrival Estimation with Deep Learning – From Acoustic Simulations to Recorded Data

The spatial information about a sound source is carried by acoustic waves to a microphone array and can be observed through estimation of phase and amplitude differences between microphones. Time difference of arrival (TDoA) captures the propagation delay of the wavefront between microphones and can be used to steer a beamformer or to localize the source. However, reverberation and interference can deteriorate the TDoA estimate. Deep neural networks (DNNs) through supervised learning can extract speech related TDoAs in more adverse conditions than traditional correlation -based methods.Acoustic simulations provide large amounts of data with annotations, while real recordings require manual annotations or the use of reference sensors with proper calibration procedures. The distributions of these two data sources can differ. When a DNN model that is trained using simulated data is presented with real data from a different distribution, its performance decreases if not properly addressed.For the reduction of DNN –based TDoA estimation error, this work investigates the role of different input normalization techniques, mixing of simulated and real data for training, and applying an adversarial domain adaptation technique. Results quantify the reduction in TDoA error for real data using the different approaches. It is evident that the use of normalization methods, domain-adaptation, and real data during training can reduce the TDoA error.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 22nd International Workshop on Multimedia Signal Processing (MMSP)

自引率

0.00%

发文量