2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
Smart IDReader: Document Recognition in Video Stream 智能IDReader:视频流中的文档识别
K. Bulatov, V. Arlazarov, T. S. Chernov, O. Slavin, D. Nikolaev
{"title":"Smart IDReader: Document Recognition in Video Stream","authors":"K. Bulatov, V. Arlazarov, T. S. Chernov, O. Slavin, D. Nikolaev","doi":"10.1109/ICDAR.2017.347","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.347","url":null,"abstract":"This work is devoted to an identity document recognition system design for use in mobile phones and tablets using the computational capabilities of the device itself. Key differences are discussed in relation to conservative cloud recognition systems which commonly use single images as an input by design. A mobile recognition system chart is presented which is constructed with computational limitations in mind and which is implemented in a commercial solution. An original approach designed to improve recognition precision and reliability using post-OCR results integration in video stream, as opposed to approaches which rely on frame image integration using \"super-resolution\" algorithms. An interactive feedback between the system and its operator is discussed, such as automatic video stream recognition stopping decision. Experimental results are presented for an implemented commercial system \"Smart IDReader\" designed for identity documents recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133263143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 55
A Long Term Memory Recognition Framework on Multi-Complexity Motion Gestures 多复杂动作手势的长期记忆识别框架
Songbin Xu, Yang Xue
{"title":"A Long Term Memory Recognition Framework on Multi-Complexity Motion Gestures","authors":"Songbin Xu, Yang Xue","doi":"10.1109/ICDAR.2017.41","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.41","url":null,"abstract":"Most existing researches on inertial sensor based dynamic motion gestures use deterministic or stochastic methods, however, these models generally possess short term memory so that they only memorize few time steps before and ignore the historical information deeper in time. Furthermore, researchers mainly investigate on the primary level gestures, while gestures with higher complexity are more powerful in expression. In this paper, we implement an end-to-end framework for recognition on multi-complexity dynamic motion gestures using a Long Short-Term Memory Recurrent Neural Network (LSTM-RNN). Since the lack of available motion database, we collected three databases of motion gestures in different levels of complexity. Motion gesture signals were carefully pre-processed and sent for training without feature extraction. Results on 5-folds cross validation prove that our framework has good recognition and real-time performance on different types of gestures, and shows robustness to the invalid segments, and the time consumption of recognition keeps stable when gesture classes increase.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133290266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Core Region Detection for Off-Line Unconstrained Handwritten Latin Words Using Word Envelops 基于单词信封的离线无约束手写拉丁单词核心区域检测
Shilpa Pandey, Gaurav Harit
{"title":"Core Region Detection for Off-Line Unconstrained Handwritten Latin Words Using Word Envelops","authors":"Shilpa Pandey, Gaurav Harit","doi":"10.1109/ICDAR.2017.108","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.108","url":null,"abstract":"Zone extraction is acclaimed as a significant pre-processing step in handwriting analysis. This paper presents a new method for separating ascenders and descenders from an unconstrained handwritten word and identifying its core-region. The method estimates correct core-region for complexities like long horizontal strokes, skewed words, first letter capital, hill and dale writing, jumping baselines and words with long descender curves, cursive handwriting, calligraphic words, title case words, very short words as shown in Fig. 1. It extracts two envelops from the word image and selects sample points that constitute the core region envelop. The method is tested on CVL, ICDAR-2013, ICFHR-2012, and IAM benchmark datasets of handwritten words written by multiple writers. We also created our own dataset of 100 words authored by 2 writers comprising all the above mentioned handwriting complexities. Due to non-availability of the Ground Truth for core-region extraction we created it manually for all the datasets. Our work reports an accuracy of 90.16% for correctly identifying all the three zones on 17,100 Latin words written by 802 individuals. Promising results are obtained by our core-region detection method when compared with the current state of the art methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131852718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition 学习空间嵌入判别部分检测器用于场景字符识别
Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang
{"title":"Learning Spatially Embedded Discriminative Part Detectors for Scene Character Recognition","authors":"Yanna Wang, Cunzhao Shi, Baihua Xiao, Chunheng Wang","doi":"10.1109/ICDAR.2017.67","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.67","url":null,"abstract":"Recognizing scene character is extremely challenging due to various interference factors such as character translation, blur and uneven illumination, etc. Considering that characters are composed of a series of parts and different parts attract diverse attentions when people observe a character, we should assign different importance to each part to recognize scene character. In this paper, we propose a discriminative character representation by aggregating the responses of the spatially embedded salient part detectors. Specifically, we first extract the convolution activations from the pre-trained convolutional neural network (CNN). These convolutional activations are considered as the local descriptors of the character parts. Then we learn a set of part detectors and pick the distinctive convolutional activations which respond to the salient parts. Moreover, to alleviate the effect of character translation, rotation and deformation, etc, we assign a response region for each part detector and search the maximal response in this region. Finally, we aggregate the maximal outputs of all the salient part detectors to represent character. The experiments on three datasets show the effectiveness of the proposed method for scene character recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"769 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134303780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting 评估基于cnn的词识别的词串嵌入和损失函数
Sebastian Sudholt, G. Fink
{"title":"Evaluating Word String Embeddings and Loss Functions for CNN-Based Word Spotting","authors":"Sebastian Sudholt, G. Fink","doi":"10.1109/ICDAR.2017.87","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.87","url":null,"abstract":"The recent past has seen CNNs take over the field of word spotting. The dominance of these neural networks is fueled by learning to predict a word string embedding for a given input image. While the PHOC (Pyramidal Histogram of Characters) is most prominently used, other embeddings such as the Discrete Cosine Transform of Words have been used as well. In this work, we investigate the use of different word string embeddings for word spotting. For this, we make use of the recently proposed PHOCNet and modify it to be able to not only learn binary representations. Our extensive evaluation shows that a large number of combinations of word string embeddings and loss functions achieve roughly the same results on different word spotting benchmarks. This leads us to the conclusion that no word string embedding is really superior to another and new embeddings should focus on incorporating more information than only character counts and positions.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"184 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115727748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Robust Document Image Dewarping Method Using Text-Lines and Line Segments 基于文本行和线段的鲁棒文档图像去翘曲方法
T. Kil, Wonkyo Seo, H. Koo, N. Cho
{"title":"Robust Document Image Dewarping Method Using Text-Lines and Line Segments","authors":"T. Kil, Wonkyo Seo, H. Koo, N. Cho","doi":"10.1109/ICDAR.2017.146","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.146","url":null,"abstract":"Conventional text-line based document dewarping methods have problems when handling complex layout and/or very few text-lines. When there are few aligned text-lines in the image, this usually means that photos, graphics and/or tables take large portion of the input instead. Hence, for the robust document dewarping, we propose to use line segments in the image in addition to the aligned text-lines. Based on the assumption and observation that many of the line segments in the image are horizontally or vertically aligned in the well-rectified images, we encode this property into the cost function in addition to the text-line alignment cost. By minimizing the function, we can obtain transformation parameters for camera pose, page curve, etc., which are used for document rectification. Considering that there are many outliers in line segment directions and missed text-lines in some cases, the overall algorithm is designed in an iterative manner. At each step, we remove text components and line segments that are not well aligned, and then minimize the cost function with the updated information. Experimental results show that the proposed method is robust to the variety of page layouts.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114255735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
Real-Time Document Image Classification Using Deep CNN and Extreme Learning Machines 使用深度CNN和极限学习机的实时文档图像分类
Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, M. Liwicki
{"title":"Real-Time Document Image Classification Using Deep CNN and Extreme Learning Machines","authors":"Andreas Kölsch, Muhammad Zeshan Afzal, Markus Ebbecke, M. Liwicki","doi":"10.1109/ICDAR.2017.217","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.217","url":null,"abstract":"This paper presents an approach for real-time training and testing for document image classification. In production environments, it is crucial to perform accurate and (time-)efficient training. Existing deep learning approaches for classifying documents do not meet these requirements, as they require much time for training and fine-tuning the deep architectures. Motivated from Computer Vision, we propose a two-stage approach. The first stage trains a deep network that works as feature extractor and in the second stage, Extreme Learning Machines (ELMs) are used for classification. The proposed approach outperforms all previously reported structural and deep learning based methods with a final accuracy of 83.24% on Tobacco-3482 dataset, leading to a relative error reduction of 25% when compared to a previous Convolutional Neural Network (CNN) based approach (DeepDocClassifier). More importantly, the training time of the ELM is only 1.176 seconds and the overall prediction time for 2,482 images is 3.066 seconds. As such, this novel approach makes deep learning-based document classification suitable for large-scale real-time applications.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114100176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 61
Semantic Text Encoding for Text Classification Using Convolutional Neural Networks 基于卷积神经网络的语义文本编码文本分类
I. Gallo, Shah Nawaz, Alessandro Calefati
{"title":"Semantic Text Encoding for Text Classification Using Convolutional Neural Networks","authors":"I. Gallo, Shah Nawaz, Alessandro Calefati","doi":"10.1109/ICDAR.2017.323","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.323","url":null,"abstract":"In this paper, we encode semantics of a text document in an image to take advantage of the same Convolutional Neural Networks (CNNs) that have been successfully employed to image classification. We use Word2Vec, which is an estimation of word representation in a vector space that can maintain the semantic and syntactic relationships among words. Word2Vec vectors are transformed into graphical words representing sequence of words in the text document. The encoded images are classified by using the AlexNet architecture. We introduced a new dataset named Text-Ferramenta gathered from an Italian price comparison website and we evaluated the encoding scheme through this dataset along with two publicly available datasets i.e. 20news-bydate and StackOverflow. Our scheme outperforms the text classification approach based on Doc2Vec and Support Vector Machine (SVM) when all the words of a text document can be completely encoded in an image. We believe that the results on these datasets are an interesting starting point for many Natural Language Processing works based on CNNs, such as a multimodal approach that could use a single CNN to classify both image and text information.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115395494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Residual Recurrent Neural Network with Sparse Training for Offline Arabic Handwriting Recognition 基于稀疏训练的残差递归神经网络离线阿拉伯手写识别
Ruijie Yan, Liangrui Peng, GuangXiang Bin, Shengjin Wang, Yao Cheng
{"title":"Residual Recurrent Neural Network with Sparse Training for Offline Arabic Handwriting Recognition","authors":"Ruijie Yan, Liangrui Peng, GuangXiang Bin, Shengjin Wang, Yao Cheng","doi":"10.1109/ICDAR.2017.171","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.171","url":null,"abstract":"Deep Recurrent Neural Networks (RNN) have been suffering from the overfitting problem due to the model redundancy of the network structures. We propose a novel temporal and spatial residual learning method for RNN, followed with sparse training by weight pruning to gain sparsity in network parameters. For a Long Short-Term Memory (LSTM) network, we explore the combination schemes and parameter settings for temporal and spatial residual learning with sparse training. Experiments are carried out on the IFN/ENIT database. For the character error rate on the testing set e while training with sets a, b, c, d, the previously reported best result is 13.42%, and the proposed configuration of temporal residual learning followed with sparse training achieves the state-of-the-art result 12.06%.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124927100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Color Stability and Homogeneity Regions to Detect Text in Real Scene Images: CSHR 在真实场景图像中检测文本的色彩稳定性和均匀性区域:CSHR
Houda Gaddour, S. Kanoun, N. Vincent
{"title":"Color Stability and Homogeneity Regions to Detect Text in Real Scene Images: CSHR","authors":"Houda Gaddour, S. Kanoun, N. Vincent","doi":"10.1109/ICDAR.2017.211","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.211","url":null,"abstract":"In this paper, a novel method called CSHR for affine invariant detection of stable and homogeneous parts of the extremal regions to localize text in natural scene images is proposed. The basic idea of this method was to apply two local thresholds to extract the extremal regions by their color homogeneity and select the candidate regions by maximum and minimum surface limits. Then, the candidate regions were filtered according to a stability criterion to extract the maximally stable parts of the extremal regions. Finally, the text regions are filtered using region area, orientation, and aspect ratio properties as well as features specific to the Arabic language to focus on Arabic writing. The proposed approach which was tested on the ICDAR2003 database and on our database showed an improvement over the existing methods.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122956940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信