{"title":"Predicting song genre with deep learning","authors":"Antonijo Marijić, Marina Bagić Babac","doi":"10.1108/gkmc-08-2022-0187","DOIUrl":null,"url":null,"abstract":"\nPurpose\nGenre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage.\n\n\nDesign/methodology/approach\nThe main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT).\n\n\nFindings\nThe authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer.\n\n\nOriginality/value\nThis study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.\n","PeriodicalId":43718,"journal":{"name":"Global Knowledge Memory and Communication","volume":null,"pages":null},"PeriodicalIF":2.1000,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Knowledge Memory and Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/gkmc-08-2022-0187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 3
Abstract
Purpose
Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage.
Design/methodology/approach
The main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT).
Findings
The authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer.
Originality/value
This study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.