Diego Furtado Silva, Micael Valterlânio da Silva, Ricardo Szram Filho, A. Silva
{"title":"On the Fusion of Multiple Audio Representations for Music Genre Classification","authors":"Diego Furtado Silva, Micael Valterlânio da Silva, Ricardo Szram Filho, A. Silva","doi":"10.5753/sbcm.2021.19423","DOIUrl":null,"url":null,"abstract":"Music classification is one of the most studied tasks in music information retrieval. Notably, one of the targets with high interest in this task is the music genre. In this scenario, the use of deep neural networks has led to the current state-of-the-art results. Research endeavors in this knowledge domain focus on a single feature to represent the audio in the input for the classification model. Due to this task’s nature, researchers usually rely on time-frequency-based features, especially those designed to make timbre more explicit. However, the audio processing literature presents many strategies to build representations that reveal diverse characteristics of music, such as key and tempo, which may contribute with relevant information for the classification of genres. We showed an exploratory study on different neural network model fusion techniques for music genre classification with multiple features as input. Our results demonstrate that Multi-Feature Fusion Networks consistently improve the classification accuracy for suitable choices of input representations.","PeriodicalId":292360,"journal":{"name":"Anais do XVIII Simpósio Brasileiro de Computação Musical (SBCM 2021)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XVIII Simpósio Brasileiro de Computação Musical (SBCM 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/sbcm.2021.19423","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Music classification is one of the most studied tasks in music information retrieval. Notably, one of the targets with high interest in this task is the music genre. In this scenario, the use of deep neural networks has led to the current state-of-the-art results. Research endeavors in this knowledge domain focus on a single feature to represent the audio in the input for the classification model. Due to this task’s nature, researchers usually rely on time-frequency-based features, especially those designed to make timbre more explicit. However, the audio processing literature presents many strategies to build representations that reveal diverse characteristics of music, such as key and tempo, which may contribute with relevant information for the classification of genres. We showed an exploratory study on different neural network model fusion techniques for music genre classification with multiple features as input. Our results demonstrate that Multi-Feature Fusion Networks consistently improve the classification accuracy for suitable choices of input representations.