光学音乐识别中的知识发现：利用实例分割加强信息检索

arXiv - CS - Sound Pub Date : 2024-08-27 DOI:arxiv-2408.15002

Elona Shatri, George Fazekas

{"title":"光学音乐识别中的知识发现：利用实例分割加强信息检索","authors":"Elona Shatri, George Fazekas","doi":"arxiv-2408.15002","DOIUrl":null,"url":null,"abstract":"Optical Music Recognition (OMR) automates the transcription of musical\nnotation from images into machine-readable formats like MusicXML, MEI, or MIDI,\nsignificantly reducing the costs and time of manual transcription. This study\nexplores knowledge discovery in OMR by applying instance segmentation using\nMask R-CNN to enhance the detection and delineation of musical symbols in sheet\nmusic. Unlike Optical Character Recognition (OCR), OMR must handle the\nintricate semantics of Common Western Music Notation (CWMN), where symbol\nmeanings depend on shape, position, and context. Our approach leverages\ninstance segmentation to manage the density and overlap of musical symbols,\nfacilitating more precise information retrieval from music scores. Evaluations\non the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with\nour method achieving a mean Average Precision (mAP) of up to 59.70\\% in dense\nsymbol environments, achieving comparable results to object detection.\nFurthermore, using traditional computer vision techniques, we add a parallel\nstep for staff detection to infer the pitch for the recognised symbols. This\nstudy emphasises the role of pixel-wise segmentation in advancing accurate\nmusic symbol recognition, contributing to knowledge discovery in OMR. Our\nfindings indicate that instance segmentation provides more precise\nrepresentations of musical symbols, particularly in densely populated scores,\nadvancing OMR technology. We make our implementation, pre-processing scripts,\ntrained models, and evaluation results publicly available to support further\nresearch and development.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"218 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation\",\"authors\":\"Elona Shatri, George Fazekas\",\"doi\":\"arxiv-2408.15002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optical Music Recognition (OMR) automates the transcription of musical\\nnotation from images into machine-readable formats like MusicXML, MEI, or MIDI,\\nsignificantly reducing the costs and time of manual transcription. This study\\nexplores knowledge discovery in OMR by applying instance segmentation using\\nMask R-CNN to enhance the detection and delineation of musical symbols in sheet\\nmusic. Unlike Optical Character Recognition (OCR), OMR must handle the\\nintricate semantics of Common Western Music Notation (CWMN), where symbol\\nmeanings depend on shape, position, and context. Our approach leverages\\ninstance segmentation to manage the density and overlap of musical symbols,\\nfacilitating more precise information retrieval from music scores. Evaluations\\non the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with\\nour method achieving a mean Average Precision (mAP) of up to 59.70\\\\% in dense\\nsymbol environments, achieving comparable results to object detection.\\nFurthermore, using traditional computer vision techniques, we add a parallel\\nstep for staff detection to infer the pitch for the recognised symbols. This\\nstudy emphasises the role of pixel-wise segmentation in advancing accurate\\nmusic symbol recognition, contributing to knowledge discovery in OMR. Our\\nfindings indicate that instance segmentation provides more precise\\nrepresentations of musical symbols, particularly in densely populated scores,\\nadvancing OMR technology. We make our implementation, pre-processing scripts,\\ntrained models, and evaluation results publicly available to support further\\nresearch and development.\",\"PeriodicalId\":501178,\"journal\":{\"name\":\"arXiv - CS - Sound\",\"volume\":\"218 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Sound\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.15002\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.15002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

光学音乐识别（OMR）可将音乐符号从图像自动转录为机器可读的格式，如 MusicXML、MEI 或 MIDI，大大减少了人工转录的成本和时间。本研究通过使用掩码 R-CNN 进行实例分割来增强乐谱中音乐符号的检测和划分，从而探索 OMR 中的知识发现。与光学字符识别（OCR）不同，OMR 必须处理通用西方音乐符号（CWMN）的复杂语义，其中符号的含义取决于形状、位置和上下文。我们的方法利用实例分割来管理音乐符号的密度和重叠，从而促进从乐谱中进行更精确的信息检索。在 DoReMi 和 MUSCIMA++ 数据集上进行的评估表明，我们的方法有了实质性的改进，在符号密集的环境中，平均精确度（mAP）高达 59.70%，达到了与物体检测相当的结果。这项研究强调了像素分割在提高音乐符号识别准确性方面的作用，有助于发现 OMR 中的知识。我们的研究结果表明，实例分割可提供更精确的音乐符号表示，尤其是在音乐符号密集的乐谱中，从而推动了 OMR 技术的发展。我们公开了我们的实现、预处理脚本、训练模型和评估结果，以支持进一步的研究和开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Knowledge Discovery in Optical Music Recognition: Enhancing Information Retrieval with Instance Segmentation

Optical Music Recognition (OMR) automates the transcription of musical notation from images into machine-readable formats like MusicXML, MEI, or MIDI, significantly reducing the costs and time of manual transcription. This study explores knowledge discovery in OMR by applying instance segmentation using Mask R-CNN to enhance the detection and delineation of musical symbols in sheet music. Unlike Optical Character Recognition (OCR), OMR must handle the intricate semantics of Common Western Music Notation (CWMN), where symbol meanings depend on shape, position, and context. Our approach leverages instance segmentation to manage the density and overlap of musical symbols, facilitating more precise information retrieval from music scores. Evaluations on the DoReMi and MUSCIMA++ datasets demonstrate substantial improvements, with our method achieving a mean Average Precision (mAP) of up to 59.70\% in dense symbol environments, achieving comparable results to object detection. Furthermore, using traditional computer vision techniques, we add a parallel step for staff detection to infer the pitch for the recognised symbols. This study emphasises the role of pixel-wise segmentation in advancing accurate music symbol recognition, contributing to knowledge discovery in OMR. Our findings indicate that instance segmentation provides more precise representations of musical symbols, particularly in densely populated scores, advancing OMR technology. We make our implementation, pre-processing scripts, trained models, and evaluation results publicly available to support further research and development.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Sound

自引率

0.00%

发文量