STWWgram-ODCBAM: Multimodal feature fusion and dynamic attention mechanism for anomalous sound detection

IF 3.6 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Signal Processing Pub Date : 2025-08-05 DOI:10.1016/j.sigpro.2025.110218

Libin Zheng, Dongsheng Liu, Tong Wu, Yahui Chen

{"title":"STWWgram-ODCBAM: Multimodal feature fusion and dynamic attention mechanism for anomalous sound detection","authors":"Libin Zheng, Dongsheng Liu, Tong Wu, Yahui Chen","doi":"10.1016/j.sigpro.2025.110218","DOIUrl":null,"url":null,"abstract":"<div><div>Anomalous sound detection (ASD) aims to identify abnormal acoustic patterns emitted by machines or devices, enabling the timely detection of potential malfunctions. In recent years, various approaches have been proposed to extract both temporal and spectral features from audio data to improve detection performance. However, simply concatenating these features often leads to high-dimensional representations containing redundant information, which increases the risk of overfitting and hinders model performance. To address this issue, we propose a novel model based on a dynamic attention mechanism that adaptively selects and emphasizes informative temporal and spectral features while suppressing irrelevant noise. This enhances the quality of feature representation and improves the accuracy of anomaly detection. Moreover, we design a joint learning architecture that simultaneously captures multimodal features from both time and frequency domains, enabling the model to better capture the complex nature of audio signals and enrich the expressiveness of acoustic features. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches on the DCASE 2020 Challenge Task 2 dataset, achieving AUC and mAUC improvements of 0.40% and 0.88%, respectively. Notably, for the challenging ToyConveyor machine type, our method achieves a remarkable 5.2% improvement in AUC, demonstrating strong robustness and generalization capability.</div></div>","PeriodicalId":49523,"journal":{"name":"Signal Processing","volume":"239 ","pages":"Article 110218"},"PeriodicalIF":3.6000,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0165168425003329","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Anomalous sound detection (ASD) aims to identify abnormal acoustic patterns emitted by machines or devices, enabling the timely detection of potential malfunctions. In recent years, various approaches have been proposed to extract both temporal and spectral features from audio data to improve detection performance. However, simply concatenating these features often leads to high-dimensional representations containing redundant information, which increases the risk of overfitting and hinders model performance. To address this issue, we propose a novel model based on a dynamic attention mechanism that adaptively selects and emphasizes informative temporal and spectral features while suppressing irrelevant noise. This enhances the quality of feature representation and improves the accuracy of anomaly detection. Moreover, we design a joint learning architecture that simultaneously captures multimodal features from both time and frequency domains, enabling the model to better capture the complex nature of audio signals and enrich the expressiveness of acoustic features. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches on the DCASE 2020 Challenge Task 2 dataset, achieving AUC and mAUC improvements of 0.40% and 0.88%, respectively. Notably, for the challenging ToyConveyor machine type, our method achieves a remarkable 5.2% improvement in AUC, demonstrating strong robustness and generalization capability.

Abstract Image

查看原文本刊更多论文

STWWgram-ODCBAM：异常声音检测的多模态特征融合与动态注意机制

异常声音检测（ASD）旨在识别机器或设备发出的异常声音模式，从而及时发现潜在的故障。近年来，人们提出了多种方法从音频数据中提取时间和频谱特征，以提高检测性能。然而，简单地连接这些特征通常会导致包含冗余信息的高维表示，这增加了过拟合的风险并阻碍了模型的性能。为了解决这个问题，我们提出了一个基于动态注意机制的新模型，该模型自适应地选择和强调信息时间和频谱特征，同时抑制无关噪声。这提高了特征表示的质量，提高了异常检测的准确性。此外，我们设计了一个联合学习架构，可以同时从时域和频域捕获多模态特征，使模型能够更好地捕获音频信号的复杂性，丰富声学特征的表达性。实验结果表明，该方法在DCASE 2020 Challenge Task 2数据集上显著优于现有方法，AUC和mAUC分别提高了0.40%和0.88%。值得注意的是，对于具有挑战性的ToyConveyor机器类型，我们的方法在AUC上取得了5.2%的显著提高，显示出强大的鲁棒性和泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Signal Processing 工程技术-工程：电子与电气

CiteScore

9.20

自引率

9.10%

发文量

309

审稿时长

41 days

期刊介绍： Signal Processing incorporates all aspects of the theory and practice of signal processing. It features original research work, tutorial and review articles, and accounts of practical developments. It is intended for a rapid dissemination of knowledge and experience to engineers and scientists working in the research, development or practical application of signal processing. Subject areas covered by the journal include: Signal Theory; Stochastic Processes; Detection and Estimation; Spectral Analysis; Filtering; Signal Processing Systems; Software Developments; Image Processing; Pattern Recognition; Optical Signal Processing; Digital Signal Processing; Multi-dimensional Signal Processing; Communication Signal Processing; Biomedical Signal Processing; Geophysical and Astrophysical Signal Processing; Earth Resources Signal Processing; Acoustic and Vibration Signal Processing; Data Processing; Remote Sensing; Signal Processing Technology; Radar Signal Processing; Sonar Signal Processing; Industrial Applications; New Applications.