{"title":"基于CNN和多核学习的视听数据鸟类分类","authors":"B. Naranchimeg, Chao Zhang, T. Akashi","doi":"10.1109/CW.2019.00022","DOIUrl":null,"url":null,"abstract":"Recently, deep convolutional neural networks (CNN) have become a new standard in many machine learning applications not only in image but also in audio processing. However, most of the studies only explore a single type of training data. In this paper, we present a study on classifying bird species by combining deep neural features of both visual and audio data using kernel-based fusion method. Specifically, we extract deep neural features based on the activation values of an inner layer of CNN. We combine these features by multiple kernel learning (MKL) to perform the final classification. In the experiment, we train and evaluate our method on a CUB-200-2011 standard data set combined with our originally collected audio data set with respect to 200 bird species (classes). The experimental results indicate that our CNN+MKL method which utilizes the combination of both categories of data outperforms single-modality methods, some simple kernel combination methods, and the conventional early fusion method.","PeriodicalId":117409,"journal":{"name":"2019 International Conference on Cyberworlds (CW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Bird Species Classification with Audio-Visual Data using CNN and Multiple Kernel Learning\",\"authors\":\"B. Naranchimeg, Chao Zhang, T. Akashi\",\"doi\":\"10.1109/CW.2019.00022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, deep convolutional neural networks (CNN) have become a new standard in many machine learning applications not only in image but also in audio processing. However, most of the studies only explore a single type of training data. In this paper, we present a study on classifying bird species by combining deep neural features of both visual and audio data using kernel-based fusion method. Specifically, we extract deep neural features based on the activation values of an inner layer of CNN. We combine these features by multiple kernel learning (MKL) to perform the final classification. In the experiment, we train and evaluate our method on a CUB-200-2011 standard data set combined with our originally collected audio data set with respect to 200 bird species (classes). The experimental results indicate that our CNN+MKL method which utilizes the combination of both categories of data outperforms single-modality methods, some simple kernel combination methods, and the conventional early fusion method.\",\"PeriodicalId\":117409,\"journal\":{\"name\":\"2019 International Conference on Cyberworlds (CW)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Cyberworlds (CW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CW.2019.00022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Cyberworlds (CW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CW.2019.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bird Species Classification with Audio-Visual Data using CNN and Multiple Kernel Learning
Recently, deep convolutional neural networks (CNN) have become a new standard in many machine learning applications not only in image but also in audio processing. However, most of the studies only explore a single type of training data. In this paper, we present a study on classifying bird species by combining deep neural features of both visual and audio data using kernel-based fusion method. Specifically, we extract deep neural features based on the activation values of an inner layer of CNN. We combine these features by multiple kernel learning (MKL) to perform the final classification. In the experiment, we train and evaluate our method on a CUB-200-2011 standard data set combined with our originally collected audio data set with respect to 200 bird species (classes). The experimental results indicate that our CNN+MKL method which utilizes the combination of both categories of data outperforms single-modality methods, some simple kernel combination methods, and the conventional early fusion method.