Deep Learning-Based Optical Music Recognition for Semantic Representation of Non-overlap and Overlap Music Notes

IF 1.2 Q3 MULTIDISCIPLINARY SCIENCES

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY Pub Date : 2024-03-11 DOI:10.14500/aro.11402

Rana L. Abdulazeez, Fattah Alizadeh

{"title":"Deep Learning-Based Optical Music Recognition for Semantic Representation of Non-overlap and Overlap Music Notes","authors":"Rana L. Abdulazeez, Fattah Alizadeh","doi":"10.14500/aro.11402","DOIUrl":null,"url":null,"abstract":"In the technology era, the process of teaching a computer to interpret musical notation is termed optical music recognition (OMR). It aims to convert musical note sheets presented in an image into a computer-readable format. Recently, the sequence-to-sequence model along with the attention mechanism (which is used in text and handwritten recognition) has been used in music notes recognition. However, due to the gradual disappearance of excessively long sequences of musical sheets, the mentioned OMR models which consist of long short-term memory are facing difficulties in learning the relationships among the musical notations. Consequently, a new framework has been proposed, leveraging the image segmentation technique to break up the procedure into several steps. In addition, an overlap problem in OMR has been addressed in this study. Overlapping can result in misinterpretation of music notations, producing inaccurate findings. Thus, a novel algorithm is being suggested to detect and segment the notations that are extremely close to each other. Our experiments are based on the usage of the Convolutional Neural Network block as a feature extractor from the image of the musical sheet and the sequence-to-sequence model to retrieve the corresponding semantic representation. The proposed approach is evaluated on The Printed Images of Music Staves dataset. The achieved results confirm that our suggested framework successfully solves the problem of long sequence music sheets, obtaining SER 0% for the non-overlap symbols in the best scenario. Furthermore, our approach has shown promising results in addressing the overlapping problem: 23.12 % SER for overlapping symbols.","PeriodicalId":8398,"journal":{"name":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14500/aro.11402","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In the technology era, the process of teaching a computer to interpret musical notation is termed optical music recognition (OMR). It aims to convert musical note sheets presented in an image into a computer-readable format. Recently, the sequence-to-sequence model along with the attention mechanism (which is used in text and handwritten recognition) has been used in music notes recognition. However, due to the gradual disappearance of excessively long sequences of musical sheets, the mentioned OMR models which consist of long short-term memory are facing difficulties in learning the relationships among the musical notations. Consequently, a new framework has been proposed, leveraging the image segmentation technique to break up the procedure into several steps. In addition, an overlap problem in OMR has been addressed in this study. Overlapping can result in misinterpretation of music notations, producing inaccurate findings. Thus, a novel algorithm is being suggested to detect and segment the notations that are extremely close to each other. Our experiments are based on the usage of the Convolutional Neural Network block as a feature extractor from the image of the musical sheet and the sequence-to-sequence model to retrieve the corresponding semantic representation. The proposed approach is evaluated on The Printed Images of Music Staves dataset. The achieved results confirm that our suggested framework successfully solves the problem of long sequence music sheets, obtaining SER 0% for the non-overlap symbols in the best scenario. Furthermore, our approach has shown promising results in addressing the overlapping problem: 23.12 % SER for overlapping symbols.

查看原文本刊更多论文

基于深度学习的光学音乐识别，用于非重叠和重叠音符的语义表征

在科技时代，教会计算机解读音乐符号的过程被称为光学音乐识别（OMR）。其目的是将图像中的乐谱转换成计算机可读的格式。最近，序列到序列模型和注意力机制（用于文本和手写识别）被用于音符识别。然而，由于过长的乐谱序列逐渐消失，上述由长短时记忆组成的 OMR 模型在学习音符之间的关系时面临困难。因此，我们提出了一个新的框架，利用图像分割技术将这一过程分成几个步骤。此外，本研究还解决了 OMR 中的重叠问题。重叠会导致对音乐符号的误读，产生不准确的结果。因此，我们提出了一种新颖的算法来检测和分割彼此极为接近的符号。我们的实验基于使用卷积神经网络块作为乐谱图像的特征提取器，以及序列到序列模型来检索相应的语义表示。我们在 "乐谱印刷图像 "数据集上对所提出的方法进行了评估。结果证实，我们建议的框架成功地解决了长序列乐谱的问题，在最佳情况下，非重叠符号的 SER 为 0%。此外，我们的方法在解决重叠问题方面也取得了可喜的成果：重叠符号的 SER 为 23.12%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY MULTIDISCIPLINARY SCIENCES-

自引率

33.30%

发文量

审稿时长

16 weeks