Robust Classification of Urban Sounds in Noisy Environments: A Novel Approach Using SPWVD-MFCC and Dual-Stream Classifier

IF 1.8 4区物理与天体物理

Acoustics Australia Pub Date : 2025-03-24 DOI:10.1007/s40857-025-00350-6

Bo Peng, Kevin I-Kai Wang, Waleed H. Abdulla

{"title":"Robust Classification of Urban Sounds in Noisy Environments: A Novel Approach Using SPWVD-MFCC and Dual-Stream Classifier","authors":"Bo Peng, Kevin I-Kai Wang, Waleed H. Abdulla","doi":"10.1007/s40857-025-00350-6","DOIUrl":null,"url":null,"abstract":"<div><p>Urban sound classification is essential for effective sound monitoring and mitigation strategies, which are critical to addressing the negative impacts of noise pollution on public health. While existing methods predominantly rely on Short-Term Fourier Transform (STFT)-based features like Mel-Frequency Cepstral Coefficients (MFCC), these approaches often struggle to identify the dominant sound in noisy environments. This gap in robustness limits the practical deployment of such systems in real-world urban settings, where noise levels are unpredictable and variable. Here, we introduce Smoothed Pseudo-Wigner–Ville Distribution-based MFCC (SPWVD-MFCC), a novel feature that merges SPWVD’s high time–frequency resolution with MFCC’s human-like auditory sensitivity. We further propose a dual-stream ResNet50-CNN-LSTM architecture to classify these features. Comprehensive experiments conducted on UrbanSound8K, UrbanSoundPlus, and DCASE2016 datasets demonstrate that the proposed SPWVD-MFCC significantly improves classification accuracy in noisy conditions, with an enhancement of up to 37.2% over traditional STFT-based methods and better robustness than existing approaches. These results indicate that the proposed approach addresses a critical gap in urban sound classification by providing enhanced robustness in low-SNR environments. This advancement improves the reliability of urban noise monitoring systems and contributes to the broader goal of creating healthier urban living environments by enabling more effective noise-control strategies.</p></div>","PeriodicalId":54355,"journal":{"name":"Acoustics Australia","volume":"53 2","pages":"253 - 268"},"PeriodicalIF":1.8000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s40857-025-00350-6.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acoustics Australia","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s40857-025-00350-6","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Urban sound classification is essential for effective sound monitoring and mitigation strategies, which are critical to addressing the negative impacts of noise pollution on public health. While existing methods predominantly rely on Short-Term Fourier Transform (STFT)-based features like Mel-Frequency Cepstral Coefficients (MFCC), these approaches often struggle to identify the dominant sound in noisy environments. This gap in robustness limits the practical deployment of such systems in real-world urban settings, where noise levels are unpredictable and variable. Here, we introduce Smoothed Pseudo-Wigner–Ville Distribution-based MFCC (SPWVD-MFCC), a novel feature that merges SPWVD’s high time–frequency resolution with MFCC’s human-like auditory sensitivity. We further propose a dual-stream ResNet50-CNN-LSTM architecture to classify these features. Comprehensive experiments conducted on UrbanSound8K, UrbanSoundPlus, and DCASE2016 datasets demonstrate that the proposed SPWVD-MFCC significantly improves classification accuracy in noisy conditions, with an enhancement of up to 37.2% over traditional STFT-based methods and better robustness than existing approaches. These results indicate that the proposed approach addresses a critical gap in urban sound classification by providing enhanced robustness in low-SNR environments. This advancement improves the reliability of urban noise monitoring systems and contributes to the broader goal of creating healthier urban living environments by enabling more effective noise-control strategies.

查看原文本刊更多论文

基于SPWVD-MFCC和双流分类器的噪声环境下城市声音鲁棒分类方法

城市声音分类对于有效的声音监测和缓解战略至关重要，这对于解决噪声污染对公众健康的负面影响至关重要。虽然现有的方法主要依赖于基于短期傅里叶变换（STFT）的特征，如Mel-Frequency倒谱系数（MFCC），但这些方法往往难以识别嘈杂环境中的主导声音。这种鲁棒性上的差距限制了这种系统在现实世界城市环境中的实际部署，因为城市环境中的噪音水平是不可预测和可变的。本文介绍了基于平滑伪wigner - ville分布的MFCC (SPWVD-MFCC)，这是一种融合了SPWVD的高时频分辨率和MFCC的类人听觉灵敏度的新特征。我们进一步提出了一种双流ResNet50-CNN-LSTM架构来对这些特征进行分类。在UrbanSound8K、UrbanSoundPlus和DCASE2016数据集上进行的综合实验表明，所提出的SPWVD-MFCC显著提高了噪声条件下的分类精度，比传统的基于stft的方法提高了37.2%，鲁棒性优于现有方法。这些结果表明，该方法通过在低信噪比环境中提供增强的鲁棒性，解决了城市声音分类的关键空白。这一进步提高了城市噪声监测系统的可靠性，并有助于通过实现更有效的噪声控制策略来创造更健康的城市生活环境。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Acoustics Australia ACOUSTICS-

自引率

5.90%

发文量

期刊介绍： Acoustics Australia, the journal of the Australian Acoustical Society, has been publishing high quality research and technical papers in all areas of acoustics since commencement in 1972. The target audience for the journal includes both researchers and practitioners. It aims to publish papers and technical notes that are relevant to current acoustics and of interest to members of the Society. These include but are not limited to: Architectural and Building Acoustics, Environmental Noise, Underwater Acoustics, Engineering Noise and Vibration Control, Occupational Noise Management, Hearing, Musical Acoustics.