Chandrasekhar Paseddula, Suryakanth V. Gangashetty
{"title":"使用各种特征和 DNN 模型进行声学场景分类:单层和分层方法","authors":"Chandrasekhar Paseddula, Suryakanth V. Gangashetty","doi":"10.1007/s00034-024-02836-6","DOIUrl":null,"url":null,"abstract":"<p>An acoustic scene is a complicated phenomenon; thus, it would be difficult to draw out scene-specific information from the foreground and background sound sources. To accurately discern the sound sceneries and pinpoint the distinct sound occurrences in realistic soundscapes, more study is still required. Investigating a good feature representation is helpful for acoustic scene classification (ASC). This study investigated a few common acoustic features for ASC, including the mel-frequency cepstral coefficients (MFCC), log-mel band energy (LOGMEL), linear prediction cepstral coefficients (LPCC), and all-pole group delay (APGD). To represent acoustic scenes, we proposed a variety of features based on speaker/music recognition, including inverted mel-frequency cepstral coefficients, spectral centroid magnitude coefficients, sub-band spectral flux coefficients, and single frequency filtering cepstral coefficients. Using DNN classification models, it has been investigated how these features affect the classification of acoustic scenes in the DCASE 2017 dataset. Our analysis shows that no single feature has performed better than the others for all acoustic scenarios. In general, it may be challenging for a single classifier to successfully identify all the classes when there are more acoustic scenes. Therefore, we have proposed a two-level hierarchical classification approach. This is accomplished by first determining the meta-category of the acoustic scene, followed by the fine-grained classification that falls under each meta-category. From our studies, it is observed that, the hierarchical approach has performed (81.0%) better than the monolithic classification approach (79.9%) without DNN score fusion at level 2 as post processing. The performance of the ASC system can be further improved by exploring more sophisticated complementary features. The fusion of MFCC AND LOGMEL features based monolithic system resulted in an accuracy of 90.5%. The proposed hierarchical system results in accuracy of 82.6% with DNN score fusion at level 2 as post processing.</p>","PeriodicalId":10227,"journal":{"name":"Circuits, Systems and Signal Processing","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Acoustic Scene Classification Using Various Features and DNN Model: A Monolithic and Hierarchical Approach\",\"authors\":\"Chandrasekhar Paseddula, Suryakanth V. Gangashetty\",\"doi\":\"10.1007/s00034-024-02836-6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>An acoustic scene is a complicated phenomenon; thus, it would be difficult to draw out scene-specific information from the foreground and background sound sources. To accurately discern the sound sceneries and pinpoint the distinct sound occurrences in realistic soundscapes, more study is still required. Investigating a good feature representation is helpful for acoustic scene classification (ASC). This study investigated a few common acoustic features for ASC, including the mel-frequency cepstral coefficients (MFCC), log-mel band energy (LOGMEL), linear prediction cepstral coefficients (LPCC), and all-pole group delay (APGD). To represent acoustic scenes, we proposed a variety of features based on speaker/music recognition, including inverted mel-frequency cepstral coefficients, spectral centroid magnitude coefficients, sub-band spectral flux coefficients, and single frequency filtering cepstral coefficients. Using DNN classification models, it has been investigated how these features affect the classification of acoustic scenes in the DCASE 2017 dataset. Our analysis shows that no single feature has performed better than the others for all acoustic scenarios. In general, it may be challenging for a single classifier to successfully identify all the classes when there are more acoustic scenes. Therefore, we have proposed a two-level hierarchical classification approach. This is accomplished by first determining the meta-category of the acoustic scene, followed by the fine-grained classification that falls under each meta-category. From our studies, it is observed that, the hierarchical approach has performed (81.0%) better than the monolithic classification approach (79.9%) without DNN score fusion at level 2 as post processing. The performance of the ASC system can be further improved by exploring more sophisticated complementary features. The fusion of MFCC AND LOGMEL features based monolithic system resulted in an accuracy of 90.5%. The proposed hierarchical system results in accuracy of 82.6% with DNN score fusion at level 2 as post processing.</p>\",\"PeriodicalId\":10227,\"journal\":{\"name\":\"Circuits, Systems and Signal Processing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2024-09-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Circuits, Systems and Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1007/s00034-024-02836-6\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Circuits, Systems and Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1007/s00034-024-02836-6","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Acoustic Scene Classification Using Various Features and DNN Model: A Monolithic and Hierarchical Approach
An acoustic scene is a complicated phenomenon; thus, it would be difficult to draw out scene-specific information from the foreground and background sound sources. To accurately discern the sound sceneries and pinpoint the distinct sound occurrences in realistic soundscapes, more study is still required. Investigating a good feature representation is helpful for acoustic scene classification (ASC). This study investigated a few common acoustic features for ASC, including the mel-frequency cepstral coefficients (MFCC), log-mel band energy (LOGMEL), linear prediction cepstral coefficients (LPCC), and all-pole group delay (APGD). To represent acoustic scenes, we proposed a variety of features based on speaker/music recognition, including inverted mel-frequency cepstral coefficients, spectral centroid magnitude coefficients, sub-band spectral flux coefficients, and single frequency filtering cepstral coefficients. Using DNN classification models, it has been investigated how these features affect the classification of acoustic scenes in the DCASE 2017 dataset. Our analysis shows that no single feature has performed better than the others for all acoustic scenarios. In general, it may be challenging for a single classifier to successfully identify all the classes when there are more acoustic scenes. Therefore, we have proposed a two-level hierarchical classification approach. This is accomplished by first determining the meta-category of the acoustic scene, followed by the fine-grained classification that falls under each meta-category. From our studies, it is observed that, the hierarchical approach has performed (81.0%) better than the monolithic classification approach (79.9%) without DNN score fusion at level 2 as post processing. The performance of the ASC system can be further improved by exploring more sophisticated complementary features. The fusion of MFCC AND LOGMEL features based monolithic system resulted in an accuracy of 90.5%. The proposed hierarchical system results in accuracy of 82.6% with DNN score fusion at level 2 as post processing.
期刊介绍:
Rapid developments in the analog and digital processing of signals for communication, control, and computer systems have made the theory of electrical circuits and signal processing a burgeoning area of research and design. The aim of Circuits, Systems, and Signal Processing (CSSP) is to help meet the needs of outlets for significant research papers and state-of-the-art review articles in the area.
The scope of the journal is broad, ranging from mathematical foundations to practical engineering design. It encompasses, but is not limited to, such topics as linear and nonlinear networks, distributed circuits and systems, multi-dimensional signals and systems, analog filters and signal processing, digital filters and signal processing, statistical signal processing, multimedia, computer aided design, graph theory, neural systems, communication circuits and systems, and VLSI signal processing.
The Editorial Board is international, and papers are welcome from throughout the world. The journal is devoted primarily to research papers, but survey, expository, and tutorial papers are also published.
Circuits, Systems, and Signal Processing (CSSP) is published twelve times annually.