{"title":"Human-Directed Optical Music Recognition","authors":"Liang Chen, C. Raphael","doi":"10.2352/ISSN.2470-1173.2016.17.DRR-053","DOIUrl":"https://doi.org/10.2352/ISSN.2470-1173.2016.17.DRR-053","url":null,"abstract":"We propose a human-in-the-loop scheme for optical music recognition. Starting from the results of our recognition engine, we pose the problem as one of constrained optimization, in which the human can specify various pixel labels, while our recognition engine seeks an optimal explanation subject to the humansupplied constraints. In this way we enable an interactive approach with a uniform communication channel from human to machine where both iterate their roles until the desired end is achieved. Pixel constraints may be added to various stages, including staff finding, system identification, and measure recognition. Results on a test show significant speed up when compared to purely human-driven correction. Introduction Optical Music Recognition (OMR) holds potential to transform score images into symbolic music libraries, thus enabling search, categorization, and retrieval by symbolic content, as we now take for granted with text. Such symbolic libraries would serve as the foundation for the emerging field of computational musicology, and provide data for a wide variety of fusions between music, computer science, and statistics. Equally exciting are applications such as the digital music stand, and systems that support practice and learning through objective analysis of rhythm and pitch. In spite of this promise, progress in OMR has been slow; even the best systems, both commercial and academic, leave much to be desired[7]. In many cases the effort needed to correct OMR output may be more than that of entering the music data from scratch[8]. In such cases OMR systems fail to make any meaningful contribution at all. The reason for these disappointing results is simply that OMR is hard. Bainbridge [17] discusses some the challenges of OMR that impede its development. One central problem is that music notation contains a large variety of somewhat-rare musical symbols and conventions [4], such as articulations, bowings, tremolos, fingerings, accents, harmonics, stops, repeat marks, 1st and 2nd endings, dal segno and da capo markings, trills, mordants, turns, breath marks, etc. While one can easily build recognizers that accommodate these somewhat-unusual symbols and special notational cases, the false positive detections that result often outweigh the additional correct detections they produce. Under some circumstances, some not-so-rare symbols fall into this better-not-to-recognize category, such as augmentation dots, double sharps, and partial beams. Another issue arises from the difficulty in describing the high-level structure of music notation. Objects such as chords, beamed groups, and clef-key-signatures, are highly structured and lend themselves naturally to grammatical representation, however, the overall organization of symbols within a measure is far less constrained. The OMR literature contains several efforts to formulate a unified grammar for music notation [10, 11]. These approaches represent grammars of primitive symbols (beams, fla","PeriodicalId":152377,"journal":{"name":"Document Recognition and Retrieval","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116190687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recognizing Predatory Chat Documents using Semi-supervised Anomaly Detection","authors":"M. Ebrahimi, C. Suen, O. Ormandjieva, A. Krzyżak","doi":"10.2352/ISSN.2470-1173.2016.17.DRR-063","DOIUrl":"https://doi.org/10.2352/ISSN.2470-1173.2016.17.DRR-063","url":null,"abstract":"Chat-logs are informative documents available to nowadays social network providers. Providers and law enforcement tend to use these huge logs anonymously for automatic online Sexual Predator Identification (SPI) which is a relatively new area of application. The task plays an important role in protecting children and juveniles against being exploited by online predators. Pattern recognition techniques facilitate automatic identification of harmful conversations in cyber space by law enforcements. These techniques usually require a large volume of high-quality training instances of both predatory and non-predatory documents. However, collecting non-predatory documents is not practical in real-world applications, since this category contains a large variety of documents with many topics including politics, sports, science, technology and etc. We utilized a new semi-supervised approach to mitigate this problem by adapting an anomaly detection technique called One-class Support Vector Machine which does not require non-predatory samples for training. We compared the performance of this approach against other state-ofthe-art methods which use both positive and negative instances. We observed that although anomaly detection approach utilizes only one class label for training (which is a very desirable property in practice); its performance is comparable to that of binary SVM classification. In addition, this approach outperforms the classic two-class Naïve Bayes algorithm, which we used as our baseline, in terms of both classification accuracy and precision. Introduction During the past decade, automated online Sexual Predator Identification from chat documents has boomed by means of pattern recognition techniques capable of flagging likely predators for the attention of law enforcement. The most common approach has been presented in PAN-2012 international competition [1] which was specifically engineered to accomplish the following two tasks [2]: Finding the predators vs. victims Finding the predatory messages in a predatory document The first task seems to be more important for law enforcement since it can help them to limit their search space drastically. It is worth mentioning that the second task has not been as successful as the first one due to the fact that it requires deeper natural language analysis. The first task can be performed in two steps [3]: Identifying the predatory documents in the entire conversation corpus Searching in participants of predatory documents in order to distinguish the sexual predator and victim In this paper we focus on the first step mentioned above (i.e. identifying the predatory conversations), since it will be the most proper area for helping the investigators in real-world applications. Accordingly, the main motivation behind using One-class SVM on this kind of data and treating the problem as an anomaly detection problems is making a classifier which is able to learn from only one class label instead of what we","PeriodicalId":152377,"journal":{"name":"Document Recognition and Retrieval","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123935648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Garz, Marcel Würsch, Andreas Fischer, R. Ingold
{"title":"Simple and Fast Geometrical Descriptors for Writer Identification","authors":"A. Garz, Marcel Würsch, Andreas Fischer, R. Ingold","doi":"10.2352/ISSN.2470-1173.2016.17.DRR-055","DOIUrl":"https://doi.org/10.2352/ISSN.2470-1173.2016.17.DRR-055","url":null,"abstract":"","PeriodicalId":152377,"journal":{"name":"Document Recognition and Retrieval","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131737798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic document reading system for technical drawings","authors":"Jenn-Kwei Tyan, M. Fang","doi":"10.1117/12.450719","DOIUrl":"https://doi.org/10.1117/12.450719","url":null,"abstract":"The reading of technical drawings is a complex task for automatic document processing. In this paper we present a system for reading textual descriptions from technical drawings, which provides capabilities for converting paper-based documents into an electronic archiving database system. The proposed system consists of four major processing elements: form learning, form localization, optical character recognition (OCR), and result verification. The algorithms of each element are dedicated to solve the practical problems in reading technical drawing documents. Among them, form localization and OCR are the key processes for automation. Experimental results have shown the feasibility and efficiency of our approaches.","PeriodicalId":152377,"journal":{"name":"Document Recognition and Retrieval","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131741537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Form classification","authors":"K. Reddy, V. Govindaraju","doi":"10.1117/12.766737","DOIUrl":"https://doi.org/10.1117/12.766737","url":null,"abstract":"The problem of form classification is to assign a single-page form image to one of a set of predefined form types or classes. We classify the form images using low level pixel density information from the binary images of the documents. In this paper, we solve the form classification problem with a classifier based on the k-means algorithm, supported by adaptive boosting. Our classification method is tested on the NIST scanned tax forms data bases (special forms databases 2 and 6) which include machine-typed and handwritten documents. Our method improves the performance over published results on the same databases, while still using a simple set of image features.","PeriodicalId":152377,"journal":{"name":"Document Recognition and Retrieval","volume":"159 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120898601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}