Musical Key Classification Using Convolutional Neural Network Based on Extended Constant-Q Chromagram

2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) Pub Date : 2022-11-22 DOI:10.1109/ISPACS57703.2022.10082833

S. Chivapreecha, Tantep Sinjanakhom, A. Trirat

{"title":"Musical Key Classification Using Convolutional Neural Network Based on Extended Constant-Q Chromagram","authors":"S. Chivapreecha, Tantep Sinjanakhom, A. Trirat","doi":"10.1109/ISPACS57703.2022.10082833","DOIUrl":null,"url":null,"abstract":"In the field of music information retrieval, musical key classification is one of the challenges. This paper illustrates the advantages of the proposed system with relevant experimental results, starting with diverse audio datasets for feature extraction used for training and testing a classification model which is based on a convolutional neural network (CNN). The goal is to develop a feature that can improve the neural network's performance. To compare the effect of input features on efficiency, a basic CNN is trained from the ground up and utilized as an image classification tool. The Chromagram-24, an augmented version of the input chroma feature, is proposed to improve the accuracy of musical key detection. In terms of weighted score, the model using Chromagram-24 as an input feature outperforms the model trained using a conventional 12-dimensional chromagram by 12.77% and achieves the highest score of 85.63% when classifying full-length songs. Chromagrams are generated using audio excerpts ranging in length from 15 to 60 seconds for local key estimation, whereas, for global key estimation, a full-length audio set is used. The results indicate that, given the different lengths of training audio input, executing the model using a chromagram of a 60-second audio excerpt yields the best results.","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"568 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the field of music information retrieval, musical key classification is one of the challenges. This paper illustrates the advantages of the proposed system with relevant experimental results, starting with diverse audio datasets for feature extraction used for training and testing a classification model which is based on a convolutional neural network (CNN). The goal is to develop a feature that can improve the neural network's performance. To compare the effect of input features on efficiency, a basic CNN is trained from the ground up and utilized as an image classification tool. The Chromagram-24, an augmented version of the input chroma feature, is proposed to improve the accuracy of musical key detection. In terms of weighted score, the model using Chromagram-24 as an input feature outperforms the model trained using a conventional 12-dimensional chromagram by 12.77% and achieves the highest score of 85.63% when classifying full-length songs. Chromagrams are generated using audio excerpts ranging in length from 15 to 60 seconds for local key estimation, whereas, for global key estimation, a full-length audio set is used. The results indicate that, given the different lengths of training audio input, executing the model using a chromagram of a 60-second audio excerpt yields the best results.

查看原文本刊更多论文

基于扩展常q色谱的卷积神经网络音乐键分类

在音乐信息检索领域，音乐关键字的分类是一个难题。本文通过相关的实验结果说明了所提出系统的优势，从不同的音频数据集开始，用于特征提取，用于训练和测试基于卷积神经网络(CNN)的分类模型。目标是开发一个可以提高神经网络性能的特征。为了比较输入特征对效率的影响，我们从头开始训练一个基本的CNN，并将其用作图像分类工具。提出了一种增强版的输入色度特征Chromagram-24，以提高音乐琴键检测的准确性。在加权分数方面，使用chromagram -24作为输入特征的模型比使用传统12维chromagram训练的模型高出12.77%，在对全长歌曲进行分类时达到了85.63%的最高分数。对于本地键估计，使用长度为15到60秒的音频摘录生成色谱图，而对于全局键估计，则使用全长音频集。结果表明，给定不同长度的训练音频输入，使用60秒音频摘录的色谱图执行模型可以产生最好的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)

自引率

0.00%

发文量