{"title":"基于深度神经网络的MFCC和SCMC特征输入融合声学场景分类","authors":"Chandrasekhar Paseddula, S. Gangashetty","doi":"10.1109/ICIINFS.2018.8721416","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a feature set by concatenating Mel-Frequency Cepstral Coefficients (MFCC) and Spectral Centroid Magnitude Coefficients (SCMC) features for Acoustic Scene Classification (ASC) using Deep Neural Networks (DNN). MFCC features are used to hold the acoustic characteristics such as spectral envelope of an acoustic scene in each frame. It also carries the sub-band average energy as a single dimension. SCMC features are used to hold the distribution of energy in a sub-band effectively. A test is carried out on Tampere University of Technology (TUT) Acoustic Scenes 2017 Dataset. The DNN architecture for utterance level classification has been used. The proposed system’s performance on a 4-fold cross-validation setup is 80.2% and it gives 5.4% relative improvement in performance when compared to the baseline system that uses log-Mel band energies with Multi-Layer Perceptron model.","PeriodicalId":397083,"journal":{"name":"2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS)","volume":"134 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Input Fusion of MFCC and SCMC Features for Acoustic Scene Classification using DNN\",\"authors\":\"Chandrasekhar Paseddula, S. Gangashetty\",\"doi\":\"10.1109/ICIINFS.2018.8721416\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a feature set by concatenating Mel-Frequency Cepstral Coefficients (MFCC) and Spectral Centroid Magnitude Coefficients (SCMC) features for Acoustic Scene Classification (ASC) using Deep Neural Networks (DNN). MFCC features are used to hold the acoustic characteristics such as spectral envelope of an acoustic scene in each frame. It also carries the sub-band average energy as a single dimension. SCMC features are used to hold the distribution of energy in a sub-band effectively. A test is carried out on Tampere University of Technology (TUT) Acoustic Scenes 2017 Dataset. The DNN architecture for utterance level classification has been used. The proposed system’s performance on a 4-fold cross-validation setup is 80.2% and it gives 5.4% relative improvement in performance when compared to the baseline system that uses log-Mel band energies with Multi-Layer Perceptron model.\",\"PeriodicalId\":397083,\"journal\":{\"name\":\"2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS)\",\"volume\":\"134 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIINFS.2018.8721416\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 13th International Conference on Industrial and Information Systems (ICIIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIINFS.2018.8721416","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Input Fusion of MFCC and SCMC Features for Acoustic Scene Classification using DNN
In this paper, we propose a feature set by concatenating Mel-Frequency Cepstral Coefficients (MFCC) and Spectral Centroid Magnitude Coefficients (SCMC) features for Acoustic Scene Classification (ASC) using Deep Neural Networks (DNN). MFCC features are used to hold the acoustic characteristics such as spectral envelope of an acoustic scene in each frame. It also carries the sub-band average energy as a single dimension. SCMC features are used to hold the distribution of energy in a sub-band effectively. A test is carried out on Tampere University of Technology (TUT) Acoustic Scenes 2017 Dataset. The DNN architecture for utterance level classification has been used. The proposed system’s performance on a 4-fold cross-validation setup is 80.2% and it gives 5.4% relative improvement in performance when compared to the baseline system that uses log-Mel band energies with Multi-Layer Perceptron model.