K. Bulatov, V. Arlazarov, T. S. Chernov, O. Slavin, D. Nikolaev
{"title":"Smart IDReader: Document Recognition in Video Stream","authors":"K. Bulatov, V. Arlazarov, T. S. Chernov, O. Slavin, D. Nikolaev","doi":"10.1109/ICDAR.2017.347","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.347","url":null,"abstract":"This work is devoted to an identity document recognition system design for use in mobile phones and tablets using the computational capabilities of the device itself. Key differences are discussed in relation to conservative cloud recognition systems which commonly use single images as an input by design. A mobile recognition system chart is presented which is constructed with computational limitations in mind and which is implemented in a commercial solution. An original approach designed to improve recognition precision and reliability using post-OCR results integration in video stream, as opposed to approaches which rely on frame image integration using \"super-resolution\" algorithms. An interactive feedback between the system and its operator is discussed, such as automatic video stream recognition stopping decision. Experimental results are presented for an implemented commercial system \"Smart IDReader\" designed for identity documents recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133263143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Long Term Memory Recognition Framework on Multi-Complexity Motion Gestures","authors":"Songbin Xu, Yang Xue","doi":"10.1109/ICDAR.2017.41","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.41","url":null,"abstract":"Most existing researches on inertial sensor based dynamic motion gestures use deterministic or stochastic methods, however, these models generally possess short term memory so that they only memorize few time steps before and ignore the historical information deeper in time. Furthermore, researchers mainly investigate on the primary level gestures, while gestures with higher complexity are more powerful in expression. In this paper, we implement an end-to-end framework for recognition on multi-complexity dynamic motion gestures using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN). Since the lack of available motion database, we collected three databases of motion gestures in different levels of complexity. Motion gesture signals were carefully pre-processed and sent for training without feature extraction. Results on 5-folds cross validation prove that our framework has good recognition and real-time performance on different types of gestures, and shows robustness to the invalid segments, and the time consumption of recognition keeps stable when gesture classes increase.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133290266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Core Region Detection for Off-Line Unconstrained Handwritten Latin Words Using Word Envelops","authors":"Shilpa Pandey, Gaurav Harit","doi":"10.1109/ICDAR.2017.108","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.108","url":null,"abstract":"Zone extraction is acclaimed as a significant pre-processing step in handwriting analysis. This paper presents a new method for separating ascenders and descenders from an unconstrained handwritten word and identifying its core-region. The method estimates correct core-region for complexities like long horizontal strokes, skewed words, first letter capital, hill and dale writing, jumping baselines and words with long descender curves, cursive handwriting, calligraphic words, title case words, very short words as shown in Fig. 1. It extracts two envelops from the word image and selects sample points that constitute the core region envelop. The method is tested on CVL, ICDAR-2013, ICFHR-2012, and IAM benchmark datasets of handwritten words written by multiple writers. We also created our own dataset of 100 words authored by 2 writers comprising all the above mentioned handwriting complexities. Due to non-availability of the Ground Truth for core-region extraction we created it manually for all the datasets. Our work reports an accuracy of 90.16% for correctly identifying all the three zones on 17,100 Latin words written by 802 individuals. Promising results are obtained by our core-region detection method when compared with the current state of the art methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131852718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang
{"title":"Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition","authors":"Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang","doi":"10.1109/ICDAR.2017.67","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.67","url":null,"abstract":"Recognizing scene character is extremely challenging due to various interference factors such as character translation, blur and uneven illumination, etc. Considering that characters are composed of a series of parts and different parts attract diverse attentions when people observe a character, we should assign different importance to each part to recognize scene character. In this paper, we propose a discriminative character representation by aggregating the responses of the spatially embedded salient part detectors. Specifically, we first extract the convolution activations from the pre-trained convolutional neural network (CNN). These convolutional activations are considered as the local descriptors of the character parts. Then we learn a set of part detectors and pick the distinctive convolutional activations which respond to the salient parts. Moreover, to alleviate the effect of character translation, rotation and deformation, etc, we assign a response region for each part detector and search the maximal response in this region. Finally, we aggregate the maximal outputs of all the salient part detectors to represent character. The experiments on three datasets show the effectiveness of the proposed method for scene character recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"769 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134303780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting","authors":"Sebastian Sudholt, G. Fink","doi":"10.1109/ICDAR.2017.87","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.87","url":null,"abstract":"The recent past has seen CNNs take over the field of word spotting. The dominance of these neural networks is fueled by learning to predict a word string embedding for a given input image. While the PHOC (Pyramidal Histogram of Characters) is most prominently used, other embeddings such as the Discrete Cosine Transform of Words have been used as well. In this work, we investigate the use of different word string embeddings for word spotting. For this, we make use of the recently proposed PHOCNet and modify it to be able to not only learn binary representations. Our extensive evaluation shows that a large number of combinations of word string embeddings and loss functions achieve roughly the same results on different word spotting benchmarks. This leads us to the conclusion that no word string embedding is really superior to another and new embeddings should focus on incorporating more information than only character counts and positions.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115727748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Document Image Dewarping Method Using Text-Lines and Line Segments","authors":"T. Kil, Wonkyo Seo, H. Koo, N. Cho","doi":"10.1109/ICDAR.2017.146","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.146","url":null,"abstract":"Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that many of the line segments in the image are horizontally or vertically aligned in the well-rectified images, we encode this property into the cost function in addition to the text-line alignment cost. By minimizing the function, we can obtain transformation parameters for camera pose, page curve, etc., which are used for document rectification. Considering that there are many outliers in line segment directions and missed text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, we remove text components and line segments that are not well aligned, and then minimize the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, M. Liwicki
{"title":"Real-Time Document Image Classification Using Deep CNN and Extreme Learning Machines","authors":"Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, M. Liwicki","doi":"10.1109/ICDAR.2017.217","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.217","url":null,"abstract":"This paper presents an approach for real-time training and testing for document image classification. In production environments, it is crucial to perform accurate and (time-)efficient training. Existing deep learning approaches for classifying documents do not meet these requirements, as they require much time for training and fine-tuning the deep architectures. Motivated from Computer Vision, we propose a two-stage approach. The first stage trains a deep network that works as feature extractor and in the second stage, Extreme Learning Machines (ELMs) are used for classification. The proposed approach outperforms all previously reported structural and deep learning based methods with a final accuracy of 83.24% on Tobacco-3482 dataset, leading to a relative error reduction of 25% when compared to a previous Convolutional Neural Network (CNN) based approach (DeepDocClassifier). More importantly, the training time of the ELM is only 1.176 seconds and the overall prediction time for 2,482 images is 3.066 seconds. As such, this novel approach makes deep learning-based document classification suitable for large-scale real-time applications.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114100176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic Text Encoding for Text Classification Using Convolutional Neural Networks","authors":"I. Gallo, Shah Nawaz, Alessandro Calefati","doi":"10.1109/ICDAR.2017.323","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.323","url":null,"abstract":"In this paper, we encode semantics of a text document in an image to take advantage of the same Convolutional Neural Networks (CNNs) that have been successfully employed to image classification. We use Word2Vec, which is an estimation of word representation in a vector space that can maintain the semantic and syntactic relationships among words. Word2Vec vectors are transformed into graphical words representing sequence of words in the text document. The encoded images are classified by using the AlexNet architecture. We introduced a new dataset named Text-Ferramenta gathered from an Italian price comparison website and we evaluated the encoding scheme through this dataset along with two publicly available datasets i.e. 20news-bydate and StackOverflow. Our scheme outperforms the text classification approach based on Doc2Vec and Support Vector Machine (SVM) when all the words of a text document can be completely encoded in an image. We believe that the results on these datasets are an interesting starting point for many Natural Language Processing works based on CNNs, such as a multimodal approach that could use a single CNN to classify both image and text information.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115395494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruijie Yan, Liangrui Peng, GuangXiang Bin, Shengjin Wang, Yao Cheng
{"title":"Residual Recurrent Neural Network with Sparse Training for Offline Arabic Handwriting Recognition","authors":"Ruijie Yan, Liangrui Peng, GuangXiang Bin, Shengjin Wang, Yao Cheng","doi":"10.1109/ICDAR.2017.171","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.171","url":null,"abstract":"Deep Recurrent Neural Networks (RNN) have been suffering from the overfitting problem due to the model redundancy of the network structures. We propose a novel temporal and spatial residual learning method for RNN, followed with sparse training by weight pruning to gain sparsity in network parameters. For a Long Short-Term Memory (LSTM) network, we explore the combination schemes and parameter settings for temporal and spatial residual learning with sparse training. Experiments are carried out on the IFN/ENIT database. For the character error rate on the testing set e while training with sets a, b, c, d, the previously reported best result is 13.42%, and the proposed configuration of temporal residual learning followed with sparse training achieves the state-of-the-art result 12.06%.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124927100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Stability and Homogeneity Regions to Detect Text in Real Scene Images: CSHR","authors":"Houda Gaddour, S. Kanoun, N. Vincent","doi":"10.1109/ICDAR.2017.211","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.211","url":null,"abstract":"In this paper, a novel method called CSHR for affine invariant detection of stable and homogeneous parts of the extremal regions to localize text in natural scene images is proposed. The basic idea of this method was to apply two local thresholds to extract the extremal regions by their color homogeneity and select the candidate regions by maximum and minimum surface limits. Then, the candidate regions were filtered according to a stability criterion to extract the maximally stable parts of the extremal regions. Finally, the text regions are filtered using region area, orientation, and aspect ratio properties as well as features specific to the Arabic language to focus on Arabic writing. The proposed approach which was tested on the ICDAR2003 database and on our database showed an improvement over the existing methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122956940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}