2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献_第8页

Smart IDReader: Document Recognition in Video Stream 智能IDReader:视频流中的文档识别

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.347

K. Bulatov, V. Arlazarov, T. S. Chernov, O. Slavin, D. Nikolaev

引用次数: 55

A Long Term Memory Recognition Framework on Multi-Complexity Motion Gestures 多复杂动作手势的长期记忆识别框架

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.41

Songbin Xu, Yang Xue

引用次数: 13

Core Region Detection for Off-Line Unconstrained Handwritten Latin Words Using Word Envelops 基于单词信封的离线无约束手写拉丁单词核心区域检测

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.108

Shilpa Pandey, Gaurav Harit

{"title":"Core Region Detection for Off-Line Unconstrained Handwritten Latin Words Using Word Envelops","authors":"Shilpa Pandey, Gaurav Harit","doi":"10.1109/ICDAR.2017.108","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.108","url":null,"abstract":"Zone extraction is acclaimed as a significant pre-processing step in handwriting analysis. This paper presents a new method for separating ascenders and descenders from an unconstrained handwritten word and identifying its core-region. The method estimates correct core-region for complexities like long horizontal strokes, skewed words, first letter capital, hill and dale writing, jumping baselines and words with long descender curves, cursive handwriting, calligraphic words, title case words, very short words as shown in Fig. 1. It extracts two envelops from the word image and selects sample points that constitute the core region envelop. The method is tested on CVL, ICDAR-2013, ICFHR-2012, and IAM benchmark datasets of handwritten words written by multiple writers. We also created our own dataset of 100 words authored by 2 writers comprising all the above mentioned handwriting complexities. Due to non-availability of the Ground Truth for core-region extraction we created it manually for all the datasets. Our work reports an accuracy of 90.16% for correctly identifying all the three zones on 17,100 Latin words written by 802 individuals. Promising results are obtained by our core-region detection method when compared with the current state of the art methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131852718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition 学习空间嵌入判别部分检测器用于场景字符识别

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.67

Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang

{"title":"Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition","authors":"Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang","doi":"10.1109/ICDAR.2017.67","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.67","url":null,"abstract":"Recognizing scene character is extremely challenging due to various interference factors such as character translation, blur and uneven illumination, etc. Considering that characters are composed of a series of parts and different parts attract diverse attentions when people observe a character, we should assign different importance to each part to recognize scene character. In this paper, we propose a discriminative character representation by aggregating the responses of the spatially embedded salient part detectors. Specifically, we first extract the convolution activations from the pre-trained convolutional neural network (CNN). These convolutional activations are considered as the local descriptors of the character parts. Then we learn a set of part detectors and pick the distinctive convolutional activations which respond to the salient parts. Moreover, to alleviate the effect of character translation, rotation and deformation, etc, we assign a response region for each part detector and search the maximal response in this region. Finally, we aggregate the maximal outputs of all the salient part detectors to represent character. The experiments on three datasets show the effectiveness of the proposed method for scene character recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"769 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134303780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting 评估基于cnn的词识别的词串嵌入和损失函数

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.87

Sebastian Sudholt, G. Fink

引用次数: 49

Robust Document Image Dewarping Method Using Text-Lines and Line Segments 基于文本行和线段的鲁棒文档图像去翘曲方法

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.146

T. Kil, Wonkyo Seo, H. Koo, N. Cho

{"title":"Robust Document Image Dewarping Method Using Text-Lines and Line Segments","authors":"T. Kil, Wonkyo Seo, H. Koo, N. Cho","doi":"10.1109/ICDAR.2017.146","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.146","url":null,"abstract":"Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that many of the line segments in the image are horizontally or vertically aligned in the well-rectified images, we encode this property into the cost function in addition to the text-line alignment cost. By minimizing the function, we can obtain transformation parameters for camera pose, page curve, etc., which are used for document rectification. Considering that there are many outliers in line segment directions and missed text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, we remove text components and line segments that are not well aligned, and then minimize the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Real-Time Document Image Classification Using Deep CNN and Extreme Learning Machines 使用深度CNN和极限学习机的实时文档图像分类

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.217

Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, M. Liwicki

{"title":"Real-Time Document Image Classification Using Deep CNN and Extreme Learning Machines","authors":"Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, M. Liwicki","doi":"10.1109/ICDAR.2017.217","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.217","url":null,"abstract":"This paper presents an approach for real-time training and testing for document image classification. In production environments, it is crucial to perform accurate and (time-)efficient training. Existing deep learning approaches for classifying documents do not meet these requirements, as they require much time for training and fine-tuning the deep architectures. Motivated from Computer Vision, we propose a two-stage approach. The first stage trains a deep network that works as feature extractor and in the second stage, Extreme Learning Machines (ELMs) are used for classification. The proposed approach outperforms all previously reported structural and deep learning based methods with a final accuracy of 83.24% on Tobacco-3482 dataset, leading to a relative error reduction of 25% when compared to a previous Convolutional Neural Network (CNN) based approach (DeepDocClassifier). More importantly, the training time of the ELM is only 1.176 seconds and the overall prediction time for 2,482 images is 3.066 seconds. As such, this novel approach makes deep learning-based document classification suitable for large-scale real-time applications.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114100176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Semantic Text Encoding for Text Classification Using Convolutional Neural Networks 基于卷积神经网络的语义文本编码文本分类

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.323

I. Gallo, Shah Nawaz, Alessandro Calefati

{"title":"Semantic Text Encoding for Text Classification Using Convolutional Neural Networks","authors":"I. Gallo, Shah Nawaz, Alessandro Calefati","doi":"10.1109/ICDAR.2017.323","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.323","url":null,"abstract":"In this paper, we encode semantics of a text document in an image to take advantage of the same Convolutional Neural Networks (CNNs) that have been successfully employed to image classification. We use Word2Vec, which is an estimation of word representation in a vector space that can maintain the semantic and syntactic relationships among words. Word2Vec vectors are transformed into graphical words representing sequence of words in the text document. The encoded images are classified by using the AlexNet architecture. We introduced a new dataset named Text-Ferramenta gathered from an Italian price comparison website and we evaluated the encoding scheme through this dataset along with two publicly available datasets i.e. 20news-bydate and StackOverflow. Our scheme outperforms the text classification approach based on Doc2Vec and Support Vector Machine (SVM) when all the words of a text document can be completely encoded in an image. We believe that the results on these datasets are an interesting starting point for many Natural Language Processing works based on CNNs, such as a multimodal approach that could use a single CNN to classify both image and text information.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115395494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Residual Recurrent Neural Network with Sparse Training for Offline Arabic Handwriting Recognition 基于稀疏训练的残差递归神经网络离线阿拉伯手写识别

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.171

Ruijie Yan, Liangrui Peng, GuangXiang Bin, Shengjin Wang, Yao Cheng

引用次数: 9

Color Stability and Homogeneity Regions to Detect Text in Real Scene Images: CSHR 在真实场景图像中检测文本的色彩稳定性和均匀性区域:CSHR

2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2017-11-01 DOI: 10.1109/ICDAR.2017.211

Houda Gaddour, S. Kanoun, N. Vincent

引用次数: 0