Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs

自主智能(英文) Pub Date : 2023-08-28 DOI:10.32629/jai.v6i3.678

Valentina Franzoni

{"title":"Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs","authors":"Valentina Franzoni","doi":"10.32629/jai.v6i3.678","DOIUrl":null,"url":null,"abstract":"In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i3.678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.

查看原文本刊更多论文

跨域协同:利用图像处理技术，通过使用cnn的频谱图分析来增强声音分类

本文综述了利用图像处理技术对音频信号的频谱表示的潜力来进行声音分类的创新方法。本研究表明，结合成熟的图像处理方法，如滤波、分割和模式识别，在转换为频谱图时提高音频信号的特征提取和分类性能是有效的。概述了基于图像和基于频谱图的音频处理所共享的数学方法，重点介绍了这两个领域在基本原理、技术和算法方面的共性。所提出的方法特别利用卷积神经网络(cnn)从频谱图中提取和分类时频特征的能力，利用其分层特征学习和对平移和尺度变化的鲁棒性的优势。在分析过程中提出了其他深度学习网络和先进技术。我们讨论了将音频信号转换为频谱图的好处和局限性，包括人类的可解释性，与图像处理技术的兼容性以及时频分辨率的灵活性。通过弥合图像处理和音频处理之间的差距，基于频谱图的音频深度学习为声音分类提供了更深入的视角，为这两个领域的跨学科研究和应用奠定了基础。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

自主智能(英文)

CiteScore

0.40

自引率

0.00%

发文量