{"title":"跨域协同:利用图像处理技术,通过使用cnn的频谱图分析来增强声音分类","authors":"Valentina Franzoni","doi":"10.32629/jai.v6i3.678","DOIUrl":null,"url":null,"abstract":"In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.","PeriodicalId":70721,"journal":{"name":"自主智能(英文)","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs\",\"authors\":\"Valentina Franzoni\",\"doi\":\"10.32629/jai.v6i3.678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.\",\"PeriodicalId\":70721,\"journal\":{\"name\":\"自主智能(英文)\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"自主智能(英文)\",\"FirstCategoryId\":\"1093\",\"ListUrlMain\":\"https://doi.org/10.32629/jai.v6i3.678\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"自主智能(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.32629/jai.v6i3.678","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cross-domain synergy: Leveraging image processing techniques for enhanced sound classification through spectrogram analysis using CNNs
In this paper, the innovative approach to sound classification by exploiting the potential of image processing techniques applied to spectrogram representations of audio signals is reviewed. This study shows the effectiveness of incorporating well-established image processing methodologies, such as filtering, segmentation, and pattern recognition, to enhance the feature extraction and classification performance of audio signals when transformed into spectrograms. An overview is provided of the mathematical methods shared by both image and spectrogram-based audio processing, focusing on the commonalities between the two domains in terms of the underlying principles, techniques, and algorithms. The proposed methodology leverages in particular the power of convolutional neural networks (CNNs) to extract and classify time-frequency features from spectrograms, capitalizing on the advantages of their hierarchical feature learning and robustness to translation and scale variations. Other deep-learning networks and advanced techniques are suggested during the analysis. We discuss the benefits and limitations of transforming audio signals into spectrograms, including human interpretability, compatibility with image processing techniques, and flexibility in time-frequency resolution. By bridging the gap between image processing and audio processing, spectrogram-based audio deep learning gives a deeper perspective on sound classification, offering fundamental insights that serve as a foundation for interdisciplinary research and applications in both domains.