Modal expansion-based data generation approach for deep learning-enabled sound source localization in a small enclosure

IF 3.4 2区 物理与天体物理 Q1 ACOUSTICS
Rendong Pi, Xiang Yu
{"title":"Modal expansion-based data generation approach for deep learning-enabled sound source localization in a small enclosure","authors":"Rendong Pi,&nbsp;Xiang Yu","doi":"10.1016/j.apacoust.2025.111023","DOIUrl":null,"url":null,"abstract":"<div><div>Accurately locating sound-emitting objects in small and confined spaces is an important but challenging topic within the field of Sound Source Localization (SSL). Most traditional SSL methods are physics-based, lacking the ability and accuracy in dealing with noisy and reverberant environments. Recently, deep learning-based approaches have emerged, but they typically require large amounts of training datasets and reliable data generation tools. To address these needs, methods for generating SSL datasets, such as Image Source Method (ISM), have been developed, which are capable of modeling large acoustic spaces with moderate reverberations. However, in small confined acoustic spaces, audio signals generated by these methods may fail to capture the dominant features of sound fields due to strong modal behaviors. In this work, we investigate SSL in small spaces by employing Modal Expansion (ME) method to generate training dataset. The general workflow is established first, applicable to a range of similar problems with common modal-dominating features. To validate the method, we choose a representative shoebox model with rigid-walls. The sound field in the enclosure, specifically the Frequency Response Functions (FRFs), are calculated using the proposed method, numerical simulations, and compared with actual experiments. The response functions that correlate the spatial relationships between any receiver and source positions within the enclosure are then transformed into Impulse Response Functions (IRFs) for comprehensive dataset generation. To evaluate the effectiveness of the proposed method, we conduct a series of SSL experiments to prove the capabilities of the proposed dataset generation tools. A neural network is trained, and its prediction accuracy is assessed with extensive validation datasets. This work proposes a promising deep learning method for sound source localization in small spaces. Our related code is available at <span><span>https://github.com/Devin-Pi/modal-expansion-for-ssl</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55506,"journal":{"name":"Applied Acoustics","volume":"241 ","pages":"Article 111023"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Acoustics","FirstCategoryId":"101","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003682X25004955","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Accurately locating sound-emitting objects in small and confined spaces is an important but challenging topic within the field of Sound Source Localization (SSL). Most traditional SSL methods are physics-based, lacking the ability and accuracy in dealing with noisy and reverberant environments. Recently, deep learning-based approaches have emerged, but they typically require large amounts of training datasets and reliable data generation tools. To address these needs, methods for generating SSL datasets, such as Image Source Method (ISM), have been developed, which are capable of modeling large acoustic spaces with moderate reverberations. However, in small confined acoustic spaces, audio signals generated by these methods may fail to capture the dominant features of sound fields due to strong modal behaviors. In this work, we investigate SSL in small spaces by employing Modal Expansion (ME) method to generate training dataset. The general workflow is established first, applicable to a range of similar problems with common modal-dominating features. To validate the method, we choose a representative shoebox model with rigid-walls. The sound field in the enclosure, specifically the Frequency Response Functions (FRFs), are calculated using the proposed method, numerical simulations, and compared with actual experiments. The response functions that correlate the spatial relationships between any receiver and source positions within the enclosure are then transformed into Impulse Response Functions (IRFs) for comprehensive dataset generation. To evaluate the effectiveness of the proposed method, we conduct a series of SSL experiments to prove the capabilities of the proposed dataset generation tools. A neural network is trained, and its prediction accuracy is assessed with extensive validation datasets. This work proposes a promising deep learning method for sound source localization in small spaces. Our related code is available at https://github.com/Devin-Pi/modal-expansion-for-ssl.
基于模态展开的小空间声源定位深度学习数据生成方法
在声源定位(SSL)领域,准确定位小空间和密闭空间中的声发射物体是一个重要但具有挑战性的课题。大多数传统的SSL方法都是基于物理的,在处理噪声和混响环境方面缺乏能力和准确性。最近,基于深度学习的方法已经出现,但它们通常需要大量的训练数据集和可靠的数据生成工具。为了满足这些需求,已经开发了生成SSL数据集的方法,例如图像源方法(ISM),它能够模拟具有中等混响的大型声学空间。然而,在狭窄的声学空间中,由于强模态行为,这些方法产生的音频信号可能无法捕捉声场的主要特征。在这项工作中,我们通过使用模态展开(ME)方法生成训练数据集来研究小空间中的SSL。首先建立通用工作流,适用于具有共同模态主导特征的一系列类似问题。为了验证该方法,我们选择了一个具有代表性的具有刚性壁的鞋盒模型。利用所提出的方法计算了箱体内的声场,特别是频响函数,并进行了数值模拟,并与实际实验进行了比较。然后,将箱体内任何接收器和源位置之间的空间关系相关的响应函数转换为脉冲响应函数(irf),以生成全面的数据集。为了评估所提出方法的有效性,我们进行了一系列SSL实验来证明所提出的数据集生成工具的能力。对神经网络进行训练,并利用大量验证数据集评估其预测精度。这项工作提出了一种有前途的小空间声源定位的深度学习方法。我们的相关代码可在https://github.com/Devin-Pi/modal-expansion-for-ssl上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Applied Acoustics
Applied Acoustics 物理-声学
CiteScore
7.40
自引率
11.80%
发文量
618
审稿时长
7.5 months
期刊介绍: Since its launch in 1968, Applied Acoustics has been publishing high quality research papers providing state-of-the-art coverage of research findings for engineers and scientists involved in applications of acoustics in the widest sense. Applied Acoustics looks not only at recent developments in the understanding of acoustics but also at ways of exploiting that understanding. The Journal aims to encourage the exchange of practical experience through publication and in so doing creates a fund of technological information that can be used for solving related problems. The presentation of information in graphical or tabular form is especially encouraged. If a report of a mathematical development is a necessary part of a paper it is important to ensure that it is there only as an integral part of a practical solution to a problem and is supported by data. Applied Acoustics encourages the exchange of practical experience in the following ways: • Complete Papers • Short Technical Notes • Review Articles; and thereby provides a wealth of technological information that can be used to solve related problems. Manuscripts that address all fields of applications of acoustics ranging from medicine and NDT to the environment and buildings are welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信