Helin Wang, Dading Chong, Dongyan Huang, Yuexian Zou
{"title":"什么影响了卷积神经网络在音频事件分类中的性能","authors":"Helin Wang, Dading Chong, Dongyan Huang, Yuexian Zou","doi":"10.1109/ACIIW.2019.8925277","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) have played an important role in Audio Event Classification (AEC). Both 1D-CNN and 2D-CNN methods have been applied to improve the classification accuracy of AEC, and there are many factors affecting the performance of models based on CNN. In this paper, we study different factors affecting the performance of CNN for AEC, including sampling rate, signal segmentation methods, window size, mel bins and filter size. The segmentation method of the event signal is an important one among them. It may lead to overfitting problem because audio events usually happen only for a short duration. We propose a signal segmentation method called Fill-length Processing to address the problem. Based on our study of these factors, we design convolutional neural networks for audio event classification (called FPNet). On the environmental sounds dataset ESC-50, the classification accuracies of FPNet-1D and FPNet-2D achieve 73.90% and 85.10% respectively, which improve significantly comparing to the previous methods.","PeriodicalId":193568,"journal":{"name":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"What Affects the Performance of Convolutional Neural Networks for Audio Event Classification\",\"authors\":\"Helin Wang, Dading Chong, Dongyan Huang, Yuexian Zou\",\"doi\":\"10.1109/ACIIW.2019.8925277\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNN) have played an important role in Audio Event Classification (AEC). Both 1D-CNN and 2D-CNN methods have been applied to improve the classification accuracy of AEC, and there are many factors affecting the performance of models based on CNN. In this paper, we study different factors affecting the performance of CNN for AEC, including sampling rate, signal segmentation methods, window size, mel bins and filter size. The segmentation method of the event signal is an important one among them. It may lead to overfitting problem because audio events usually happen only for a short duration. We propose a signal segmentation method called Fill-length Processing to address the problem. Based on our study of these factors, we design convolutional neural networks for audio event classification (called FPNet). On the environmental sounds dataset ESC-50, the classification accuracies of FPNet-1D and FPNet-2D achieve 73.90% and 85.10% respectively, which improve significantly comparing to the previous methods.\",\"PeriodicalId\":193568,\"journal\":{\"name\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIIW.2019.8925277\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIIW.2019.8925277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
What Affects the Performance of Convolutional Neural Networks for Audio Event Classification
Convolutional neural networks (CNN) have played an important role in Audio Event Classification (AEC). Both 1D-CNN and 2D-CNN methods have been applied to improve the classification accuracy of AEC, and there are many factors affecting the performance of models based on CNN. In this paper, we study different factors affecting the performance of CNN for AEC, including sampling rate, signal segmentation methods, window size, mel bins and filter size. The segmentation method of the event signal is an important one among them. It may lead to overfitting problem because audio events usually happen only for a short duration. We propose a signal segmentation method called Fill-length Processing to address the problem. Based on our study of these factors, we design convolutional neural networks for audio event classification (called FPNet). On the environmental sounds dataset ESC-50, the classification accuracies of FPNet-1D and FPNet-2D achieve 73.90% and 85.10% respectively, which improve significantly comparing to the previous methods.