Music signal recognition aids based on convolutional neural networks in music education

IF 3.6

Systems and Soft Computing Pub Date : 2025-03-17 DOI:10.1016/j.sasc.2025.200219

Xiyuan Gao , Ruohan Gao

{"title":"Music signal recognition aids based on convolutional neural networks in music education","authors":"Xiyuan Gao , Ruohan Gao","doi":"10.1016/j.sasc.2025.200219","DOIUrl":null,"url":null,"abstract":"<div><div>With the growth of diverse music information processing needs, music signal recognition technology has become more and more important in music education and music industry. In this study, a music signal recognition aider using convolutional neural network is proposed, and firstly, the logarithmic frequency domain filter bank and double-layer ReLU network are used to extract the pitch features in the music signal. Subsequently, the benchmark convolutional neural network model is constructed, and the constant Q transform is used to process the obtained features to generate a harmonic sequence matrix. Finally, a two-level classification model strategy is used to improve instrument signal recognition. In terms of pitch feature extraction, the accuracy of the logarithmic frequency domain filter group was 74.59 % and 77.03 % respectively under the frame length of 2048 and 8192, which was more effective than the double-layer ReLU network. Experimental results based on different harmonic mapping matrix levels showed that these harmonic mapping matrices had a significant impact on the recall and accuracy of different musical instruments, such as the F1 score of 0.936 for pianos. In the verification of the two-level classification model, the overall accuracy was improved from 0.848 to 0.880 of the benchmark model, which proved the effective improvement of multi-instrument music signal generalization recognition. The research contribution is to improve the ability of pitch feature extraction and establish a more efficient classification model for multi-instrument music signals. These contributions fill the research gap in extracting the pitch and part information of multiple instruments quickly and accurately in complex music works, provide powerful technical support for music analysis and understanding in music education, and innovatively promote the development of music information retrieval technology.</div></div>","PeriodicalId":101205,"journal":{"name":"Systems and Soft Computing","volume":"7 ","pages":"Article 200219"},"PeriodicalIF":3.6000,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772941925000377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

With the growth of diverse music information processing needs, music signal recognition technology has become more and more important in music education and music industry. In this study, a music signal recognition aider using convolutional neural network is proposed, and firstly, the logarithmic frequency domain filter bank and double-layer ReLU network are used to extract the pitch features in the music signal. Subsequently, the benchmark convolutional neural network model is constructed, and the constant Q transform is used to process the obtained features to generate a harmonic sequence matrix. Finally, a two-level classification model strategy is used to improve instrument signal recognition. In terms of pitch feature extraction, the accuracy of the logarithmic frequency domain filter group was 74.59 % and 77.03 % respectively under the frame length of 2048 and 8192, which was more effective than the double-layer ReLU network. Experimental results based on different harmonic mapping matrix levels showed that these harmonic mapping matrices had a significant impact on the recall and accuracy of different musical instruments, such as the F1 score of 0.936 for pianos. In the verification of the two-level classification model, the overall accuracy was improved from 0.848 to 0.880 of the benchmark model, which proved the effective improvement of multi-instrument music signal generalization recognition. The research contribution is to improve the ability of pitch feature extraction and establish a more efficient classification model for multi-instrument music signals. These contributions fill the research gap in extracting the pitch and part information of multiple instruments quickly and accurately in complex music works, provide powerful technical support for music analysis and understanding in music education, and innovatively promote the development of music information retrieval technology.

查看原文本刊更多论文

基于卷积神经网络的音乐信号识别辅助工具在音乐教育中的应用

随着音乐信息处理需求的多样化发展，音乐信号识别技术在音乐教育和音乐产业中变得越来越重要。本研究提出了一种利用卷积神经网络的音乐信号识别辅助工具，首先利用对数频域滤波器组和双层 ReLU 网络提取音乐信号中的音高特征。随后，构建基准卷积神经网络模型，并使用常数 Q 变换处理所获得的特征，生成谐波序列矩阵。最后，采用两级分类模型策略提高乐器信号识别率。在音高特征提取方面，在帧长为 2048 和 8192 时，对数频域滤波器组的准确率分别为 74.59 % 和 77.03 %，比双层 ReLU 网络更有效。基于不同谐波映射矩阵级别的实验结果表明，这些谐波映射矩阵对不同乐器的召回率和准确率有显著影响，如钢琴的 F1 得分为 0.936。在两级分类模型的验证中，整体准确率从基准模型的 0.848 提高到了 0.880，证明了多乐器音乐信号泛化识别能力的有效提高。该研究的贡献在于提高了音高特征提取能力，建立了更有效的多乐器音乐信号分类模型。这些贡献填补了在复杂音乐作品中快速准确提取多乐器音高和声部信息的研究空白，为音乐教育中的音乐分析和理解提供了有力的技术支持，创新性地推动了音乐信息检索技术的发展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Systems and Soft Computing

CiteScore

2.20

自引率

0.00%

发文量