Modal expansion-based data generation approach for deep learning-enabled sound source localization in a small enclosure

IF 3.4 2区物理与天体物理 Q1 ACOUSTICS

Applied Acoustics Pub Date : 2025-08-28 DOI:10.1016/j.apacoust.2025.111023

Rendong Pi, Xiang Yu

{"title":"Modal expansion-based data generation approach for deep learning-enabled sound source localization in a small enclosure","authors":"Rendong Pi, Xiang Yu","doi":"10.1016/j.apacoust.2025.111023","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately locating sound-emitting objects in small and confined spaces is an important but challenging topic within the field of Sound Source Localization (SSL). Most traditional SSL methods are physics-based, lacking the ability and accuracy in dealing with noisy and reverberant environments. Recently, deep learning-based approaches have emerged, but they typically require large amounts of training datasets and reliable data generation tools. To address these needs, methods for generating SSL datasets, such as Image Source Method (ISM), have been developed, which are capable of modeling large acoustic spaces with moderate reverberations. However, in small confined acoustic spaces, audio signals generated by these methods may fail to capture the dominant features of sound fields due to strong modal behaviors. In this work, we investigate SSL in small spaces by employing Modal Expansion (ME) method to generate training dataset. The general workflow is established first, applicable to a range of similar problems with common modal-dominating features. To validate the method, we choose a representative shoebox model with rigid-walls. The sound field in the enclosure, specifically the Frequency Response Functions (FRFs), are calculated using the proposed method, numerical simulations, and compared with actual experiments. The response functions that correlate the spatial relationships between any receiver and source positions within the enclosure are then transformed into Impulse Response Functions (IRFs) for comprehensive dataset generation. To evaluate the effectiveness of the proposed method, we conduct a series of SSL experiments to prove the capabilities of the proposed dataset generation tools. A neural network is trained, and its prediction accuracy is assessed with extensive validation datasets. This work proposes a promising deep learning method for sound source localization in small spaces. Our related code is available at <span><span>https://github.com/Devin-Pi/modal-expansion-for-ssl</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"241 ","pages":"Article 111023"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X25004955","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Accurately locating sound-emitting objects in small and confined spaces is an important but challenging topic within the field of Sound Source Localization (SSL). Most traditional SSL methods are physics-based, lacking the ability and accuracy in dealing with noisy and reverberant environments. Recently, deep learning-based approaches have emerged, but they typically require large amounts of training datasets and reliable data generation tools. To address these needs, methods for generating SSL datasets, such as Image Source Method (ISM), have been developed, which are capable of modeling large acoustic spaces with moderate reverberations. However, in small confined acoustic spaces, audio signals generated by these methods may fail to capture the dominant features of sound fields due to strong modal behaviors. In this work, we investigate SSL in small spaces by employing Modal Expansion (ME) method to generate training dataset. The general workflow is established first, applicable to a range of similar problems with common modal-dominating features. To validate the method, we choose a representative shoebox model with rigid-walls. The sound field in the enclosure, specifically the Frequency Response Functions (FRFs), are calculated using the proposed method, numerical simulations, and compared with actual experiments. The response functions that correlate the spatial relationships between any receiver and source positions within the enclosure are then transformed into Impulse Response Functions (IRFs) for comprehensive dataset generation. To evaluate the effectiveness of the proposed method, we conduct a series of SSL experiments to prove the capabilities of the proposed dataset generation tools. A neural network is trained, and its prediction accuracy is assessed with extensive validation datasets. This work proposes a promising deep learning method for sound source localization in small spaces. Our related code is available at https://github.com/Devin-Pi/modal-expansion-for-ssl.

查看原文本刊更多论文

基于模态展开的小空间声源定位深度学习数据生成方法

在声源定位（SSL）领域，准确定位小空间和密闭空间中的声发射物体是一个重要但具有挑战性的课题。大多数传统的SSL方法都是基于物理的，在处理噪声和混响环境方面缺乏能力和准确性。最近，基于深度学习的方法已经出现，但它们通常需要大量的训练数据集和可靠的数据生成工具。为了满足这些需求，已经开发了生成SSL数据集的方法，例如图像源方法（ISM），它能够模拟具有中等混响的大型声学空间。然而，在狭窄的声学空间中，由于强模态行为，这些方法产生的音频信号可能无法捕捉声场的主要特征。在这项工作中，我们通过使用模态展开（ME）方法生成训练数据集来研究小空间中的SSL。首先建立通用工作流，适用于具有共同模态主导特征的一系列类似问题。为了验证该方法，我们选择了一个具有代表性的具有刚性壁的鞋盒模型。利用所提出的方法计算了箱体内的声场，特别是频响函数，并进行了数值模拟，并与实际实验进行了比较。然后，将箱体内任何接收器和源位置之间的空间关系相关的响应函数转换为脉冲响应函数（irf），以生成全面的数据集。为了评估所提出方法的有效性，我们进行了一系列SSL实验来证明所提出的数据集生成工具的能力。对神经网络进行训练，并利用大量验证数据集评估其预测精度。这项工作提出了一种有前途的小空间声源定位的深度学习方法。我们的相关代码可在https://github.com/Devin-Pi/modal-expansion-for-ssl上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Acoustics 物理-声学

CiteScore

7.40

自引率

11.80%

发文量

618

审稿时长

7.5 months

期刊介绍： Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.