Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation

Elona Shatri, George Fazekas
{"title":"Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation","authors":"Elona Shatri, George Fazekas","doi":"arxiv-2408.15002","DOIUrl":null,"url":null,"abstract":"Optical Music Recognition (OMR) automates the transcription of musical\nnotation from images into machine-readable formats like MusicXML, MEI, or MIDI,\nsignificantly reducing the costs and time of manual transcription. This study\nexplores knowledge discovery in OMR by applying instance segmentation using\nMask R-CNN to enhance the detection and delineation of musical symbols in sheet\nmusic. Unlike Optical Character Recognition (OCR), OMR must handle the\nintricate semantics of Common Western Music Notation (CWMN), where symbol\nmeanings depend on shape, position, and context. Our approach leverages\ninstance segmentation to manage the density and overlap of musical symbols,\nfacilitating more precise information retrieval from music scores. Evaluations\non the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with\nour method achieving a mean Average Precision (mAP) of up to 59.70\\% in dense\nsymbol environments, achieving comparable results to object detection.\nFurthermore, using traditional computer vision techniques, we add a parallel\nstep for staff detection to infer the pitch for the recognised symbols. This\nstudy emphasises the role of pixel-wise segmentation in advancing accurate\nmusic symbol recognition, contributing to knowledge discovery in OMR. Our\nfindings indicate that instance segmentation provides more precise\nrepresentations of musical symbols, particularly in densely populated scores,\nadvancing OMR technology. We make our implementation, pre-processing scripts,\ntrained models, and evaluation results publicly available to support further\nresearch and development.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.15002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI, significantly reducing the costs and time of manual transcription. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music. Unlike Optical Character Recognition (OCR), OMR must handle the intricate semantics of Common Western Music Notation (CWMN), where symbol meanings depend on shape, position, and context. Our approach leverages instance segmentation to manage the density and overlap of musical symbols, facilitating more precise information retrieval from music scores. Evaluations on the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with our method achieving a mean Average Precision (mAP) of up to 59.70\% in dense symbol environments, achieving comparable results to object detection. Furthermore, using traditional computer vision techniques, we add a parallel step for staff detection to infer the pitch for the recognised symbols. This study emphasises the role of pixel-wise segmentation in advancing accurate music symbol recognition, contributing to knowledge discovery in OMR. Our findings indicate that instance segmentation provides more precise representations of musical symbols, particularly in densely populated scores, advancing OMR technology. We make our implementation, pre-processing scripts, trained models, and evaluation results publicly available to support further research and development.
光学音乐识别中的知识发现:利用实例分割加强信息检索
光学音乐识别(OMR)可将音乐符号从图像自动转录为机器可读的格式,如 MusicXML、MEI 或 MIDI,大大减少了人工转录的成本和时间。本研究通过使用掩码 R-CNN 进行实例分割来增强乐谱中音乐符号的检测和划分,从而探索 OMR 中的知识发现。与光学字符识别(OCR)不同,OMR 必须处理通用西方音乐符号(CWMN)的复杂语义,其中符号的含义取决于形状、位置和上下文。我们的方法利用实例分割来管理音乐符号的密度和重叠,从而促进从乐谱中进行更精确的信息检索。在 DoReMi 和 MUSCIMA++ 数据集上进行的评估表明,我们的方法有了实质性的改进,在符号密集的环境中,平均精确度(mAP)高达 59.70%,达到了与物体检测相当的结果。这项研究强调了像素分割在提高音乐符号识别准确性方面的作用,有助于发现 OMR 中的知识。我们的研究结果表明,实例分割可提供更精确的音乐符号表示,尤其是在音乐符号密集的乐谱中,从而推动了 OMR 技术的发展。我们公开了我们的实现、预处理脚本、训练模型和评估结果,以支持进一步的研究和开发。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信