Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs

Valentina Franzoni
{"title":"Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs","authors":"Valentina Franzoni","doi":"10.32629/jai.v6i3.678","DOIUrl":null,"url":null,"abstract":"In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i3.678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.
跨域协同:利用图像处理技术,通过使用cnn的频谱图分析来增强声音分类
本文综述了利用图像处理技术对音频信号的频谱表示的潜力来进行声音分类的创新方法。本研究表明,结合成熟的图像处理方法,如滤波、分割和模式识别,在转换为频谱图时提高音频信号的特征提取和分类性能是有效的。概述了基于图像和基于频谱图的音频处理所共享的数学方法,重点介绍了这两个领域在基本原理、技术和算法方面的共性。所提出的方法特别利用卷积神经网络(cnn)从频谱图中提取和分类时频特征的能力,利用其分层特征学习和对平移和尺度变化的鲁棒性的优势。在分析过程中提出了其他深度学习网络和先进技术。我们讨论了将音频信号转换为频谱图的好处和局限性,包括人类的可解释性,与图像处理技术的兼容性以及时频分辨率的灵活性。通过弥合图像处理和音频处理之间的差距,基于频谱图的音频深度学习为声音分类提供了更深入的视角,为这两个领域的跨学科研究和应用奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.40
自引率
0.00%
发文量
25
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信