Predicting song genre with deep learning

IF 2.4 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

Global Knowledge Memory and Communication Pub Date : 2023-03-28 DOI:10.1108/gkmc-08-2022-0187

Antonijo Marijić, Marina Bagić Babac

{"title":"Predicting song genre with deep learning","authors":"Antonijo Marijić, Marina Bagić Babac","doi":"10.1108/gkmc-08-2022-0187","DOIUrl":null,"url":null,"abstract":"\nPurpose\nGenre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage.\n\n\nDesign/methodology/approach\nThe main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT).\n\n\nFindings\nThe authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer.\n\n\nOriginality/value\nThis study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.\n","PeriodicalId":43718,"journal":{"name":"Global Knowledge Memory and Communication","volume":"9 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Global Knowledge Memory and Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/gkmc-08-2022-0187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 3

Abstract

Purpose Genre classification of songs based on lyrics is a challenging task even for humans, however, state-of-the-art natural language processing has recently offered advanced solutions to this task. The purpose of this study is to advance the understanding and application of natural language processing and deep learning in the domain of music genre classification, while also contributing to the broader themes of global knowledge and communication, and sustainable preservation of cultural heritage. Design/methodology/approach The main contribution of this study is the development and evaluation of various machine and deep learning models for song genre classification. Additionally, we investigated the effect of different word embeddings, including Global Vectors for Word Representation (GloVe) and Word2Vec, on the classification performance. The tested models range from benchmarks such as logistic regression, support vector machine and random forest, to more complex neural network architectures and transformer-based models, such as recurrent neural network, long short-term memory, bidirectional long short-term memory and bidirectional encoder representations from transformers (BERT). Findings The authors conducted experiments on both English and multilingual data sets for genre classification. The results show that the BERT model achieved the best accuracy on the English data set, whereas cross-lingual language model pretraining based on RoBERTa (XLM-RoBERTa) performed the best on the multilingual data set. This study found that songs in the metal genre were the most accurately labeled, as their text style and topics were the most distinct from other genres. On the contrary, songs from the pop and rock genres were more challenging to differentiate. This study also compared the impact of different word embeddings on the classification task and found that models with GloVe word embeddings outperformed Word2Vec and the learning embedding layer. Originality/value This study presents the implementation, testing and comparison of various machine and deep learning models for genre classification. The results demonstrate that transformer models, including BERT, robustly optimized BERT pretraining approach, distilled bidirectional encoder representations from transformers, bidirectional and auto-regressive transformers and XLM-RoBERTa, outperformed other models.

查看原文本刊更多论文

用深度学习预测歌曲类型

基于歌词的歌曲类型分类即使对人类来说也是一项具有挑战性的任务，然而，最新的自然语言处理技术最近为这项任务提供了先进的解决方案。本研究的目的是促进自然语言处理和深度学习在音乐类型分类领域的理解和应用，同时也为全球知识和交流以及文化遗产的可持续保护等更广泛的主题做出贡献。设计/方法/方法本研究的主要贡献是开发和评估用于歌曲类型分类的各种机器和深度学习模型。此外，我们研究了不同的词嵌入，包括全局词表示向量(GloVe)和Word2Vec，对分类性能的影响。测试的模型范围从逻辑回归、支持向量机和随机森林等基准，到更复杂的神经网络架构和基于变压器的模型，如循环神经网络、长短期记忆、双向长短期记忆和来自变压器的双向编码器表示(BERT)。作者在英语和多语言数据集上进行了体裁分类实验。结果表明，BERT模型在英语数据集上的准确率最高，而基于RoBERTa (XLM-RoBERTa)的跨语言语言模型预训练在多语言数据集上的准确率最高。这项研究发现，金属音乐类型的歌曲被贴上了最准确的标签，因为它们的文本风格和主题与其他类型的歌曲最不同。相反，流行歌曲和摇滚歌曲更难区分。本研究还比较了不同词嵌入对分类任务的影响，发现GloVe词嵌入模型优于Word2Vec和学习嵌入层。原创性/价值本研究展示了用于类型分类的各种机器和深度学习模型的实现、测试和比较。结果表明，变压器模型，包括BERT，稳健优化的BERT预训练方法，从变压器，双向和自回归变压器以及XLM-RoBERTa中提取双向编码器表示，优于其他模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Global Knowledge Memory and Communication INFORMATION SCIENCE & LIBRARY SCIENCE-

CiteScore

4.20

自引率

16.70%

发文量