Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier
{"title":"基于卷积神经网络的音乐类型分类特征映射","authors":"Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier","doi":"10.1145/3095713.3095733","DOIUrl":null,"url":null,"abstract":"Nowadays, deep learning is more and more used for Music Genre Classification: particularly Convolutional Neural Networks (CNN) taking as entry a spectrogram considered as an image on which are sought different types of structure. But, facing the criticism relating to the difficulty in understanding the underlying relationships that neural networks learn in presence of a spectrogram, we propose to use, as entries of a CNN, a small set of eight music features chosen along three main music dimensions: dynamics, timbre and tonality. With CNNs trained in such a way that filter dimensions are interpretable in time and frequency, results show that only eight music features are more efficient than 513 frequency bins of a spectrogram and that late score fusion between systems based on both feature types reaches 91% accuracy on the GTZAN database.","PeriodicalId":310224,"journal":{"name":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":"{\"title\":\"Music Feature Maps with Convolutional Neural Networks for Music Genre Classification\",\"authors\":\"Christine Sénac, Thomas Pellegrini, Florian Mouret, J. Pinquier\",\"doi\":\"10.1145/3095713.3095733\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, deep learning is more and more used for Music Genre Classification: particularly Convolutional Neural Networks (CNN) taking as entry a spectrogram considered as an image on which are sought different types of structure. But, facing the criticism relating to the difficulty in understanding the underlying relationships that neural networks learn in presence of a spectrogram, we propose to use, as entries of a CNN, a small set of eight music features chosen along three main music dimensions: dynamics, timbre and tonality. With CNNs trained in such a way that filter dimensions are interpretable in time and frequency, results show that only eight music features are more efficient than 513 frequency bins of a spectrogram and that late score fusion between systems based on both feature types reaches 91% accuracy on the GTZAN database.\",\"PeriodicalId\":310224,\"journal\":{\"name\":\"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"51\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3095713.3095733\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3095713.3095733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Music Feature Maps with Convolutional Neural Networks for Music Genre Classification
Nowadays, deep learning is more and more used for Music Genre Classification: particularly Convolutional Neural Networks (CNN) taking as entry a spectrogram considered as an image on which are sought different types of structure. But, facing the criticism relating to the difficulty in understanding the underlying relationships that neural networks learn in presence of a spectrogram, we propose to use, as entries of a CNN, a small set of eight music features chosen along three main music dimensions: dynamics, timbre and tonality. With CNNs trained in such a way that filter dimensions are interpretable in time and frequency, results show that only eight music features are more efficient than 513 frequency bins of a spectrogram and that late score fusion between systems based on both feature types reaches 91% accuracy on the GTZAN database.