Jorge Calvo-Zaragoza, Gabriel Vigliensoni, Ichiro Fujinaga
{"title":"用于音乐文档分析的像素分类","authors":"Jorge Calvo-Zaragoza, Gabriel Vigliensoni, Ichiro Fujinaga","doi":"10.1109/IPTA.2017.8310134","DOIUrl":null,"url":null,"abstract":"Content within musical documents not only contains music symbol but also include different elements such as staff lines, text, or frontispieces. Before attempting to automatically recognize components in these layers, it is necessary to perform an analysis of the musical documents in order to detect and classify each of these constituent parts. The obstacle for this analysis is the high heterogeneity amongst music collections, especially with ancient documents, which makes it difficult to devise methods that can be generalizable to a broader range of sources. In this paper we propose a data-driven document analysis framework based on machine learning that focuses on classifying regions of interest at pixel level. For that, we make use of Convolutional Neural Networks trained to infer the category of each pixel. The main advantage of this approach is that it can be applied regardless of the type of document provided, as long as training data is available. Since this work represents first efforts in that direction, our experimentation focuses on reporting a baseline classification using our framework. The experiments show promising performance, achieving an accuracy around 90% in two corpora of old music documents.","PeriodicalId":316356,"journal":{"name":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Pixelwise classification for music document analysis\",\"authors\":\"Jorge Calvo-Zaragoza, Gabriel Vigliensoni, Ichiro Fujinaga\",\"doi\":\"10.1109/IPTA.2017.8310134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Content within musical documents not only contains music symbol but also include different elements such as staff lines, text, or frontispieces. Before attempting to automatically recognize components in these layers, it is necessary to perform an analysis of the musical documents in order to detect and classify each of these constituent parts. The obstacle for this analysis is the high heterogeneity amongst music collections, especially with ancient documents, which makes it difficult to devise methods that can be generalizable to a broader range of sources. In this paper we propose a data-driven document analysis framework based on machine learning that focuses on classifying regions of interest at pixel level. For that, we make use of Convolutional Neural Networks trained to infer the category of each pixel. The main advantage of this approach is that it can be applied regardless of the type of document provided, as long as training data is available. Since this work represents first efforts in that direction, our experimentation focuses on reporting a baseline classification using our framework. The experiments show promising performance, achieving an accuracy around 90% in two corpora of old music documents.\",\"PeriodicalId\":316356,\"journal\":{\"name\":\"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)\",\"volume\":\"103 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPTA.2017.8310134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA.2017.8310134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Pixelwise classification for music document analysis
Content within musical documents not only contains music symbol but also include different elements such as staff lines, text, or frontispieces. Before attempting to automatically recognize components in these layers, it is necessary to perform an analysis of the musical documents in order to detect and classify each of these constituent parts. The obstacle for this analysis is the high heterogeneity amongst music collections, especially with ancient documents, which makes it difficult to devise methods that can be generalizable to a broader range of sources. In this paper we propose a data-driven document analysis framework based on machine learning that focuses on classifying regions of interest at pixel level. For that, we make use of Convolutional Neural Networks trained to infer the category of each pixel. The main advantage of this approach is that it can be applied regardless of the type of document provided, as long as training data is available. Since this work represents first efforts in that direction, our experimentation focuses on reporting a baseline classification using our framework. The experiments show promising performance, achieving an accuracy around 90% in two corpora of old music documents.