Musical Key Classification Using Convolutional Neural Network Based on Extended Constant-Q Chromagram

S. Chivapreecha, Tantep Sinjanakhom, A. Trirat
{"title":"Musical Key Classification Using Convolutional Neural Network Based on Extended Constant-Q Chromagram","authors":"S. Chivapreecha, Tantep Sinjanakhom, A. Trirat","doi":"10.1109/ISPACS57703.2022.10082833","DOIUrl":null,"url":null,"abstract":"In the field of music information retrieval, musical key classification is one of the challenges. This paper illustrates the advantages of the proposed system with relevant experimental results, starting with diverse audio datasets for feature extraction used for training and testing a classification model which is based on a convolutional neural network (CNN). The goal is to develop a feature that can improve the neural network's performance. To compare the effect of input features on efficiency, a basic CNN is trained from the ground up and utilized as an image classification tool. The Chromagram-24, an augmented version of the input chroma feature, is proposed to improve the accuracy of musical key detection. In terms of weighted score, the model using Chromagram-24 as an input feature outperforms the model trained using a conventional 12-dimensional chromagram by 12.77% and achieves the highest score of 85.63% when classifying full-length songs. Chromagrams are generated using audio excerpts ranging in length from 15 to 60 seconds for local key estimation, whereas, for global key estimation, a full-length audio set is used. The results indicate that, given the different lengths of training audio input, executing the model using a chromagram of a 60-second audio excerpt yields the best results.","PeriodicalId":410603,"journal":{"name":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","volume":"568 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPACS57703.2022.10082833","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In the field of music information retrieval, musical key classification is one of the challenges. This paper illustrates the advantages of the proposed system with relevant experimental results, starting with diverse audio datasets for feature extraction used for training and testing a classification model which is based on a convolutional neural network (CNN). The goal is to develop a feature that can improve the neural network's performance. To compare the effect of input features on efficiency, a basic CNN is trained from the ground up and utilized as an image classification tool. The Chromagram-24, an augmented version of the input chroma feature, is proposed to improve the accuracy of musical key detection. In terms of weighted score, the model using Chromagram-24 as an input feature outperforms the model trained using a conventional 12-dimensional chromagram by 12.77% and achieves the highest score of 85.63% when classifying full-length songs. Chromagrams are generated using audio excerpts ranging in length from 15 to 60 seconds for local key estimation, whereas, for global key estimation, a full-length audio set is used. The results indicate that, given the different lengths of training audio input, executing the model using a chromagram of a 60-second audio excerpt yields the best results.
基于扩展常q色谱的卷积神经网络音乐键分类
在音乐信息检索领域,音乐关键字的分类是一个难题。本文通过相关的实验结果说明了所提出系统的优势,从不同的音频数据集开始,用于特征提取,用于训练和测试基于卷积神经网络(CNN)的分类模型。目标是开发一个可以提高神经网络性能的特征。为了比较输入特征对效率的影响,我们从头开始训练一个基本的CNN,并将其用作图像分类工具。提出了一种增强版的输入色度特征Chromagram-24,以提高音乐琴键检测的准确性。在加权分数方面,使用chromagram -24作为输入特征的模型比使用传统12维chromagram训练的模型高出12.77%,在对全长歌曲进行分类时达到了85.63%的最高分数。对于本地键估计,使用长度为15到60秒的音频摘录生成色谱图,而对于全局键估计,则使用全长音频集。结果表明,给定不同长度的训练音频输入,使用60秒音频摘录的色谱图执行模型可以产生最好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信