A dual-encoder U-net architecture with prior knowledge embedding for acoustic source mapping.

IF 2.3 2区物理与天体物理 Q2 ACOUSTICS

Journal of the Acoustical Society of America Pub Date : 2025-09-01 DOI:10.1121/10.0039104

Haobo Jia, Feiran Yang, Xiaoqing Hu, Jun Yang

{"title":"A dual-encoder U-net architecture with prior knowledge embedding for acoustic source mapping.","authors":"Haobo Jia, Feiran Yang, Xiaoqing Hu, Jun Yang","doi":"10.1121/10.0039104","DOIUrl":null,"url":null,"abstract":"<p><p>The deconvolution approach has become a standard method for high-resolution acoustic source mapping, but it suffers from a heavy computational burden. Deep learning-based methods have shown promising progress but often rely on single-type input features and ignore the position- and frequency-dependent variabilities of the point spread function (PSF), which leads to a decline in localization accuracy. This paper proposes a supervised learning framework based on dual-encoder U-net architecture to convert beamforming maps into a high-resolution map of true source strength distribution. Specifically, the model employs two individual encoders to extract complementary features from delay-and-sum and functional beamforming maps. Because the two maps provide distinct information on the same source strength distribution, a contrastive loss function is introduced to help encoders learn consistent latent features of sources. To characterize the PSF variations, a frequency encoder and position encoder are designed to embed prior knowledge, i.e., source frequency and grid positions, into the backbone network. The proposed model outperforms competing methods, on average, across four metrics for the simulation data and MIRACLE dataset and generalizes well across different numbers of sound sources and frequencies.</p>","PeriodicalId":17168,"journal":{"name":"Journal of the Acoustical Society of America","volume":"158 3","pages":"1767-1782"},"PeriodicalIF":2.3000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Acoustical Society of America","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1121/10.0039104","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ACOUSTICS","Score":null,"Total":0}

引用次数: 0

Abstract

The deconvolution approach has become a standard method for high-resolution acoustic source mapping, but it suffers from a heavy computational burden. Deep learning-based methods have shown promising progress but often rely on single-type input features and ignore the position- and frequency-dependent variabilities of the point spread function (PSF), which leads to a decline in localization accuracy. This paper proposes a supervised learning framework based on dual-encoder U-net architecture to convert beamforming maps into a high-resolution map of true source strength distribution. Specifically, the model employs two individual encoders to extract complementary features from delay-and-sum and functional beamforming maps. Because the two maps provide distinct information on the same source strength distribution, a contrastive loss function is introduced to help encoders learn consistent latent features of sources. To characterize the PSF variations, a frequency encoder and position encoder are designed to embed prior knowledge, i.e., source frequency and grid positions, into the backbone network. The proposed model outperforms competing methods, on average, across four metrics for the simulation data and MIRACLE dataset and generalizes well across different numbers of sound sources and frequencies.

查看原文本刊更多论文

一种具有先验知识嵌入的双编码器U-net结构用于声源映射。

反褶积方法已成为高分辨率声源映射的标准方法，但其计算量较大。基于深度学习的方法已经显示出有希望的进展，但往往依赖于单一类型的输入特征，忽略了点扩散函数（PSF）的位置和频率相关的可变性，导致定位精度下降。本文提出了一种基于双编码器U-net架构的监督学习框架，将波束形成图转换为真实源强度分布的高分辨率图。具体来说，该模型使用两个单独的编码器从延迟和和和和功能波束形成地图中提取互补特征。由于两个映射在相同的源强度分布上提供了不同的信息，因此引入了对比损失函数来帮助编码器学习源的一致潜在特征。为了表征PSF的变化，设计了频率编码器和位置编码器，将先验知识（即源频率和网格位置）嵌入骨干网络。平均而言，该模型在模拟数据和MIRACLE数据集的四个指标上优于竞争方法，并且在不同数量的声源和频率上具有良好的泛化性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the Acoustical Society of America 物理-声学

CiteScore

4.60

自引率

16.70%

发文量

1433

审稿时长

4.7 months

期刊介绍： Since 1929 The Journal of the Acoustical Society of America has been the leading source of theoretical and experimental research results in the broad interdisciplinary study of sound. Subject coverage includes: linear and nonlinear acoustics; aeroacoustics, underwater sound and acoustical oceanography; ultrasonics and quantum acoustics; architectural and structural acoustics and vibration; speech, music and noise; psychology and physiology of hearing; engineering acoustics, transduction; bioacoustics, animal bioacoustics.