Michele Alberti, Manuel Bouillon, R. Ingold, M. Liwicki
{"title":"Open Evaluation Tool for Layout Analysis of Document Images","authors":"Michele Alberti, Manuel Bouillon, R. Ingold, M. Liwicki","doi":"10.1109/ICDAR.2017.311","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.311","url":null,"abstract":"This paper presents an open tool for standardizing the evaluation process of the layout analysis task of document images at pixel level. We introduce a new evaluation tool that is both available as a standalone Java application and as a RESTful web service. This evaluation tool is free and open-source in order to be a common tool that anyone can use and contribute to. It aims at providing as many metrics as possible to investigate layout analysis predictions, and also provides an easy way of visualizing the results. This tool evaluates document segmentation at pixel level, and supports multi-labeled pixel ground truth. Finally, this tool has been successfully used for the ICDAR 2017 competition on Layout Analysis for Challenging Medieval Manuscripts.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"119 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126297760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Clausner, A. Antonacopoulos, T. Derrick, S. Pletschacher
{"title":"ICDAR2017 Competition on Recognition of Early Indian Printed Documents - REID2017","authors":"C. Clausner, A. Antonacopoulos, T. Derrick, S. Pletschacher","doi":"10.1109/ICDAR.2017.230","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.230","url":null,"abstract":"This paper presents an objective comparative evaluation of page analysis and recognition methods for historical documents with text mainly in Bengali language and script. It de-scribes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2017, presenting the results of the evaluation of seven methods – three sub-mitted and four variations of open source state-of-the-art systems. The focus is on optical character recognition (OCR) performance. Different evaluation metrics were used to gain an insight into the algorithms, including new character accuracy metrics to better reflect the difficult circumstances presented by the documents. The results indicate that deep learning approaches are the most promising, but there is still a considerable need to develop robust methods that deal with challenges of historic material of this nature.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131961128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICDAR2017 Competition on Recognition of Documents with Complex Layouts - RDCL2017","authors":"C. Clausner, A. Antonacopoulos, S. Pletschacher","doi":"10.1109/ICDAR.2017.229","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.229","url":null,"abstract":"This paper presents an objective comparative evaluation of page segmentation and region classification methods for documents with complex layouts. It describes the competition (modus operandi, dataset and evaluation methodology) held in the context of ICDAR2017, presenting the results of the evaluation of seven methods – five submitted, two state-of-the-art systems (commercial and open-source). Three scenarios are reported in this paper, one evaluating the ability of methods to accurately segment regions and two evaluating both segmentation and region classification (one focusing only on text regions). For the first time, nested region content (table cells, chart labels etc.) are evaluated in addition to the top-level page content. Text recognition was a bonus challenge and was not taken up by all participants. The results indicate that an innovative approach has a clear advantage but there is still a considerable need to develop robust methods that deal with layout challenges, especially with the non-textual content.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134434687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Table of Contents Extraction by System Aggregation","authors":"Thi-Tuyet-Hai Nguyen, A. Doucet, Mickaël Coustaty","doi":"10.1109/ICDAR.2017.48","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.48","url":null,"abstract":"The OCR-ed books usually lack logical structure information, such as chapters, sections. To enrich the navigation experience of users, several approaches have been proposed to extract table of contents (ToC) from digitised books. In this paper, we introduce an aggregation-based method to enhance ToC extraction using system submissions from the ICDAR Book structure extraction competitions (2009, 2011, and 2013). Our experimental results show that the union of two best approaches outperforms the existing approaches using both the title-based and link-based evaluation measures on a dataset of more than 2000 books. By efficiently combining the results of existing systems in an unsupervised way, we consistently beat the state-of-the-art in book structure extraction, with performance improvements that are statistically significant.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123980829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Cloppet, V. Eglin, Marlene Helias-Baron, V. C. Kieu, N. Vincent, D. Stutzmann
{"title":"ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script","authors":"F. Cloppet, V. Eglin, Marlene Helias-Baron, V. C. Kieu, N. Vincent, D. Stutzmann","doi":"10.1109/ICDAR.2017.224","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.224","url":null,"abstract":"This paper presents the results of the ICDAR2017 Competition on the Classification of Medieval Handwritings in Latin Script (CLaMM), jointly organized by Computer Scientists and Humanists (paleographers). This work follows a competition at ICFHR2016 and aims at providing a rich annotated database of European medieval manuscripts to the community on Handwriting Analysis and Recognition. We proposed four independent classification tasks which attracted 10 registered teams, with 6 submitted classifiers from 4 participants. Those classifiers are trained on a set of 3540 images with their ground truths. In task 1 (Script classification) and task 3 (Date classification), the classifiers have been evaluated by a test set of 2000 greyscale, tiff, 300 dpi images. In task 2 (Script classification) and task 4 (Date classification), the test set consists of 1000 images in different formats, resolutions and color representation. The best scores are respectively 85.2% for task 1, 76.5% for task 2, 59% for task 3, and 49.9% for task 4. An analysis based on the matrix of confusion of each classifier is also given.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129435840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Study of the Factors Influencing OCR Stability for Hybrid Security","authors":"Sébastien Eskenazi, Petra Gomez-Krämer, J. Ogier","doi":"10.1109/ICDAR.2017.388","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.388","url":null,"abstract":"Optical character recognition (OCR) is a critical task in securing hybrid (digital and paper) documents. For this, its key performance criterion is stability. An unstable OCR algorithm will fail to detect two copies of the document as similar thus creating a wrong fraud detection. Having a sufficiently stable algorithm requires a very high level of performance. To improve it, we study a simple disambiguation technique called \"alphabet reduction\". It is based on the principle that characters that are visually similar should be the same character. It significantly improves the stability of two state of the art OCR algorithms on almost forty three thousand images. Yet the obtained stability is still insufficient. We also study the impact of the document variations on the stability of OCR algorithms.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114997486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maroua Mehri, P. Héroux, Julien Lerouge, R. Mullot
{"title":"Page Retrieval System in Digitized Historical Books Based on Error-Tolerant Subgraph Matching","authors":"Maroua Mehri, P. Héroux, Julien Lerouge, R. Mullot","doi":"10.1109/ICDAR.2017.193","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.193","url":null,"abstract":"Developing smart ways of interacting with scanners is one of the emerging needs identified by numerous digitization professionals. To achieve better interaction with scanners, the research community in historical document image analysis is particularly interested in providing reliable tools for computer-aided indexing and retrieval of historical document images. Thus, we propose in this article a method able to retrieve from a digitized historical book, pages having layout and/or content which meet the user-defined query. Amongst the user-defined queries we focus on the transition pages (e.g. title pages of chapter, end-of-chapter and end-of-act) and pages containing a particular content component or a group of patterns (e.g. ornaments, illustrations and drop caps) in our work. The method adopted in this work is firstly based on using low-level features (texture, shape and geometric descriptors) to represent each page in the form of a graph-based signature. Then, a set of costs is estimated using an error-tolerant subgraph isomorphism algorithm in order to measure the similarity between the user-defined query formulated in terms of a pattern graph and the different subgraphs of the book page signatures and to find book pages similar to the user-defined query. To illustrate the effectiveness of the proposed method, a thorough experimental study has been conducted with quantitative observations obtained from a large number of queries having different contents and structures.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121220287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Théodore Bluche, Christopher Kermorvant, C. Touzet, H. Glotin
{"title":"Cortical-Inspired Open-Bigram Representation for Handwritten Word Recognition","authors":"Théodore Bluche, Christopher Kermorvant, C. Touzet, H. Glotin","doi":"10.1109/ICDAR.2017.21","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.21","url":null,"abstract":"Recent research in the cognitive process of reading hypothesized that we do not read words by sequentially recognizing letters, but rather by identifing open-bigrams, i.e. couple of letters that are not necessarily next to each other. In this paper, we evaluate an handwritten word recognition method based on original open-bigrams representation. We trained Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) to predict open-bigrams rather than characters, and we show that such models are able to learn the long-range, complicated and intertwined dependencies in the input signal, necessary to the prediction. For decoding, we decomposed each word of a large vocabulary into the set of constituent bigrams, and apply a simple cosine similarity measure between this representation and the bagged RNN prediction to retrieve the vocabulary word. We compare this method to standard word recognition techniques based on sequential character recognition. Experiments are carried out on two public databases of handwritten words (Rimes and IAM). The bigram decoder results with our bigram decoder are comparable to more conventional decoding methods based on sequences of letters.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124794633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jordan Drapeau, T. Géraud, Mickaël Coustaty, J. Chazalon, J. Burie, V. Eglin, S. Bres
{"title":"Extraction of Ancient Map Contents Using Trees of Connected Components","authors":"Jordan Drapeau, T. Géraud, Mickaël Coustaty, J. Chazalon, J. Burie, V. Eglin, S. Bres","doi":"10.1109/ICDAR.2017.249","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.249","url":null,"abstract":"Ancient maps are an historical and cultural heritage widely recognized as a very important source of information, but exploiting such maps is complicated. In this project, we consider the Linguistic Atlas of France (ALF), built between 1902 and 1910. This cartographical heritage produces firstrate data for dialectological researches. In this paper, we focus on the separation of the content in layers for facilitating the extraction, the analysis, the visualization and the diffusion of the data contained in these ancient linguistic atlases.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130134817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kwon-Young Choi, Bertrand Coüasnon, Y. Ricquebourg, R. Zanibbi
{"title":"Bootstrapping Samples of Accidentals in Dense Piano Scores for CNN-Based Detection","authors":"Kwon-Young Choi, Bertrand Coüasnon, Y. Ricquebourg, R. Zanibbi","doi":"10.1109/ICDAR.2017.257","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.257","url":null,"abstract":"State-of-the-art Optical Music Recognition system often fails to process dense and damaged music scores, where many symbols can present complex segmentation problems. We propose to resolve these segmentation problems by using a CNN-based detector trained with few manually annotated data. A data augmentation bootstrapping method is used to accurately train a deep learning model to do the localization and classification of an accidental symbol associated with a note head, or the note head if there is no accidental. Using 5-fold cross-validation, we obtain an average of 98.5% localization with an IoU score over 0.5 and a classification accuracy of 99.2%.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134481763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}