{"title":"A PHOC Decoder for Lexicon-Free Handwritten Word Recognition","authors":"Giorgos Sfikas, George Retsinas, B. Gatos","doi":"10.1109/ICDAR.2017.90","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.90","url":null,"abstract":"In this paper, we propose a novel probabilistic model for lexicon-free handwriting recognition. Model inputs are word images encoded as Pyramidal Histogram Of Character (PHOC) vectors. PHOC vectors have been used as efficient attribute-based, multi-resolution representations of either text strings or word image contents. The proposed model formulates PHOC decoding as the problem of finding the most probable sequence of characters corresponding to the given PHOC. We model PHOC layers as Beta-distributed observations, linked to hidden states that correspond to character estimates. Characters are in turn linked to one another along a Markov chain, encoding language model information. The sequence of characters is estimated using the max-sum algorithm in a process that is akin to Viterbi decoding. Numerical experiments on the well-known George Washington database show competitive recognition results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117163989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Graph-Based Deep Learning for Graphics Classification","authors":"Pau Riba, Anjan Dutta, J. Lladós, A. Fornés","doi":"10.1109/ICDAR.2017.262","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.262","url":null,"abstract":"Graph-based representations are a common way to deal with graphics recognition problems. However, previous works were mainly focused on developing learning-free techniques. The success of deep learning frameworks have proved that learning is a powerful tool to solve many problems, however it is not straightforward to extend these methodologies to non euclidean data such as graphs. On the other hand, graphs are a good representational structure for graphical entities. In this work, we present some deep learning techniques that have been proposed in the literature for graph-based representations and we show how they can be used in graphics recognition problems.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125760230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Text Detection in Born-Digital Images via Fully Convolutional Networks","authors":"Nibal Nayef, J. Ogier","doi":"10.1109/ICDAR.2017.145","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.145","url":null,"abstract":"Traditional layout analysis methods cannot be easily adapted to born-digital images which carry properties from both regular document images and natural scene images. One layout approach for analyzing born-digital images is to separate the text layer from the graphics layer before further analyzing any of them. In this paper, we propose a method for detecting text regions in such images by casting the detection problem as a semantic object segmentation problem. The text classification is done in a holistic approach using fully convolutional networks where the full image is fed as input to the network and the output is a pixel heat map of the same input image size. This solves the problem of low resolution images, and the variability of text scale within one image. It also eliminates the need for finding interest points, candidate text locations or low level components. The experimental evaluation of our method on the ICDAR 2013 dataset shows that our method outperforms state-of-the-art methods. The detected text regions also allow flexibility to later apply methods for finding text components at character, word or textline levels in different orientations.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124715059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Detection and Recognition of Arabic Text in Video Frames","authors":"W. Ohyama, Seiya Iwata, T. Wakabayashi, F. Kimura","doi":"10.1109/ICDAR.2017.360","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.360","url":null,"abstract":"The authors have developed an end-to-end system for Arabic text recognition in video frames. The end-to-end system consists of the steps for text-line detection, word segmentation and word recognition. In order to achieve high text recognition accuracy we propose a new scheme of integrated text detection-recognition scheme, where the true text-lines are detected with as higher recall rate as possible and the false words in the false lines are rejected in the successive word recognition step. We reported a recognition based transition frame detection of Arabic news captions in single channel video images. In this paper the recognition system is integrated with n-gram language model and extended to text detection/recognition of multi-channel video images. The multi-channel, multi-font performance of the system is experimentally evaluated using AcTiV-D and AcTiV-R dataset. The multi-channel text detection performance for three channels, France24, Russia Today and TunisiaNat1 is 91.29% in (F)-measure. The multi-channel, multi-font character recognition performance for these channels is 94.84% in F-measure.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124746405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michaël Clément, Mickaël Coustaty, Camille Kurtz, L. Wendling
{"title":"Local Enlacement Histograms for Historical Drop Caps Style Recognition","authors":"Michaël Clément, Mickaël Coustaty, Camille Kurtz, L. Wendling","doi":"10.1109/ICDAR.2017.57","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.57","url":null,"abstract":"This article focuses on the specific issue of drop caps image recognition in the context of cultural heritage preservation. Due to their heterogeneity and their weakly structured properties, these historical images represent challenging data. An important aspect in the recognition process of drop caps is their background styles, which can be considered as discriminative features to identify both the printer and the period. Most existing methods for style recognition are based on low-level features such as color or texture properties. In this article, we present a novel framework for the recognition of drop caps style based on features of higher levels. We propose to capture the spatial structure carried by these images using relative position descriptors modeling the enlacement between local cells of pixel layers obtained from a document segmentation step. Such descriptors are then exploited in an efficient bag-of-features learning procedure. Experimental results obtained on a dataset of historical drop caps images highlight the interest of this approach, and in particular the benefit of considering spatial information.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128735497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
George Retsinas, N. Stamatopoulos, G. Louloudis, Giorgos Sfikas, B. Gatos
{"title":"Nonlinear Manifold Embedding on Keyword Spotting Using t-SNE","authors":"George Retsinas, N. Stamatopoulos, G. Louloudis, Giorgos Sfikas, B. Gatos","doi":"10.1109/ICDAR.2017.86","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.86","url":null,"abstract":"Nonlinear manifold embedding has attracted considerable attention due to its highly-desired property of efficiently encoding local structure, i.e. intrinsic space properties, into a low-dimensional space. The benefit of such an approach is twofold: it leads to compact representations while addressing the often-encountered curse of dimensionality. The latter plays an important role in retrieval applications, such as keyword spotting, where a sorted list of retrieved objects with respect to a distance metric is required. In this work, we explore the efficiency of the popular manifold embedding method t-distributed Stochastic Neighbor Embedding (t-SNE) on the Query-by-Example keyword spotting task. The main contribution of this work is the extension of t-SNE in order to support out-of-sample (OOS) embedding which is essential for mapping query images to the embedding space. The experimental results demonstrate a significant increase in keyword spotting performance when the word similarity is calculated on the embedding space.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition","authors":"Jianshu Zhang, Jun Du, Lirong Dai","doi":"10.1109/ICDAR.2017.152","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.152","url":null,"abstract":"In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER). First, the input two-dimensional ink trajectory information of handwritten expression is encoded via the gated recurrent unit based recurrent neural network (GRU-RNN). Then the decoder is also implemented by the GRU-RNN with a coverage-based attention model. The proposed approach can simultaneously accomplish the symbol recognition and structural analysis to output a character sequence in LaTeX format. Validated on the CROHME 2014 competition task, our approach significantly outperforms the state-of-the-art with an expression recognition accuracy of 52.43% by only using the official training dataset. Furthermore, the alignments between the input trajectories of handwritten expressions and the output LaTeX sequences are visualized by the attention mechanism to show the effectiveness of the proposed method.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131027896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Axel Jean-Caurant, Nouredine Tamani, V. Courboulay, J. Burie
{"title":"Lexicographical-Based Order for Post-OCR Correction of Named Entities","authors":"Axel Jean-Caurant, Nouredine Tamani, V. Courboulay, J. Burie","doi":"10.1109/ICDAR.2017.197","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.197","url":null,"abstract":"We are in the era of information access in which a huge amount of text is extracted from scanned documents and made available digitally to be used in search processes. However, old or poorly scanned documents suffer from bad recognition, which leads to not only imperfect Optical Character Recognition (OCR), but to bad indexation and unattainable information, as well. To cope with the aforementioned issues, we introduce in this paper a lexicographical-based approach for Post-OCR correction applied to named entities. By combining lexicographically a contextual similarity and an edit distance, the approach builds a graph connecting similar named entities, in order to automatically correct the corresponding OCR processed text. We evaluated our approach on a generated dataset. The first results obtained showed that, despite the high level of degradation of the text, the approach succeeded in correcting more than a third of named entities without the need for any external knowledge.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123730181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Albert Berenguel Centeno, O. R. Terrades, Josep Lladós Canet, Cristina Cañero Morales
{"title":"Evaluation of Texture Descriptors for Validation of Counterfeit Documents","authors":"Albert Berenguel Centeno, O. R. Terrades, Josep Lladós Canet, Cristina Cañero Morales","doi":"10.1109/ICDAR.2017.204","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.204","url":null,"abstract":"This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Into the Colorful World of Webtoons: Through the Lens of Neural Networks","authors":"Ceyda Cinarel, Byoung-Tak Zhang","doi":"10.1109/ICDAR.2017.289","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.289","url":null,"abstract":"The task of colorizing black and white images has previously been explored for natural images. In this paper we look at the task of colorization on a different domain: webtoons. To our knowledge this type of dataset hasn't been used before. Webtoons are usually produced in color thus they make a good dataset for analyzing different colorization models. Comics like webtoons also present some additional challenges over natural images, such as occlusion by speech bubbles and text. First we look at some of the previously introduced models' performance on this task and suggest modifications to address their problems. We propose a new model composed of two networks; one network generates sparse color information and a second network uses this generated color information as input to apply color to the whole image. These two networks are trained end-to-end. Our proposed model solves some of the problems observed with other architectures, resulting in better colorizations.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123206492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}