2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
A PHOC Decoder for Lexicon-Free Handwritten Word Recognition 一个PHOC解码器的无词典手写字识别
Giorgos Sfikas, George Retsinas, B. Gatos
{"title":"A PHOC Decoder for Lexicon-Free Handwritten Word Recognition","authors":"Giorgos Sfikas, George Retsinas, B. Gatos","doi":"10.1109/ICDAR.2017.90","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.90","url":null,"abstract":"In this paper, we propose a novel probabilistic model for lexicon-free handwriting recognition. Model inputs are word images encoded as Pyramidal Histogram Of Character (PHOC) vectors. PHOC vectors have been used as efficient attribute-based, multi-resolution representations of either text strings or word image contents. The proposed model formulates PHOC decoding as the problem of finding the most probable sequence of characters corresponding to the given PHOC. We model PHOC layers as Beta-distributed observations, linked to hidden states that correspond to character estimates. Characters are in turn linked to one another along a Markov chain, encoding language model information. The sequence of characters is estimated using the max-sum algorithm in a process that is akin to Viterbi decoding. Numerical experiments on the well-known George Washington database show competitive recognition results.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117163989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Graph-Based Deep Learning for Graphics Classification 基于图的图形分类深度学习
Pau Riba, Anjan Dutta, J. Lladós, A. Fornés
{"title":"Graph-Based Deep Learning for Graphics Classification","authors":"Pau Riba, Anjan Dutta, J. Lladós, A. Fornés","doi":"10.1109/ICDAR.2017.262","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.262","url":null,"abstract":"Graph-based representations are a common way to deal with graphics recognition problems. However, previous works were mainly focused on developing learning-free techniques. The success of deep learning frameworks have proved that learning is a powerful tool to solve many problems, however it is not straightforward to extend these methodologies to non euclidean data such as graphs. On the other hand, graphs are a good representational structure for graphical entities. In this work, we present some deep learning techniques that have been proposed in the literature for graph-based representations and we show how they can be used in graphics recognition problems.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125760230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Semantic Text Detection in Born-Digital Images via Fully Convolutional Networks 基于全卷积网络的出生数字图像语义文本检测
Nibal Nayef, J. Ogier
{"title":"Semantic Text Detection in Born-Digital Images via Fully Convolutional Networks","authors":"Nibal Nayef, J. Ogier","doi":"10.1109/ICDAR.2017.145","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.145","url":null,"abstract":"Traditional layout analysis methods cannot be easily adapted to born-digital images which carry properties from both regular document images and natural scene images. One layout approach for analyzing born-digital images is to separate the text layer from the graphics layer before further analyzing any of them. In this paper, we propose a method for detecting text regions in such images by casting the detection problem as a semantic object segmentation problem. The text classification is done in a holistic approach using fully convolutional networks where the full image is fed as input to the network and the output is a pixel heat map of the same input image size. This solves the problem of low resolution images, and the variability of text scale within one image. It also eliminates the need for finding interest points, candidate text locations or low level components. The experimental evaluation of our method on the ICDAR 2013 dataset shows that our method outperforms state-of-the-art methods. The detected text regions also allow flexibility to later apply methods for finding text components at character, word or textline levels in different orientations.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124715059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detection and Recognition of Arabic Text in Video Frames 视频帧中阿拉伯语文本的检测与识别
W. Ohyama, Seiya Iwata, T. Wakabayashi, F. Kimura
{"title":"Detection and Recognition of Arabic Text in Video Frames","authors":"W. Ohyama, Seiya Iwata, T. Wakabayashi, F. Kimura","doi":"10.1109/ICDAR.2017.360","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.360","url":null,"abstract":"The authors have developed an end-to-end system for Arabic text recognition in video frames. The end-to-end system consists of the steps for text-line detection, word segmentation and word recognition. In order to achieve high text recognition accuracy we propose a new scheme of integrated text detection-recognition scheme, where the true text-lines are detected with as higher recall rate as possible and the false words in the false lines are rejected in the successive word recognition step. We reported a recognition based transition frame detection of Arabic news captions in single channel video images. In this paper the recognition system is integrated with n-gram language model and extended to text detection/recognition of multi-channel video images. The multi-channel, multi-font performance of the system is experimentally evaluated using AcTiV-D and AcTiV-R dataset. The multi-channel text detection performance for three channels, France24, Russia Today and TunisiaNat1 is 91.29% in (F)-measure. The multi-channel, multi-font character recognition performance for these channels is 94.84% in F-measure.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124746405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Local Enlacement Histograms for Historical Drop Caps Style Recognition 历史掉落帽样式识别的局部嵌套直方图
Michaël Clément, Mickaël Coustaty, Camille Kurtz, L. Wendling
{"title":"Local Enlacement Histograms for Historical Drop Caps Style Recognition","authors":"Michaël Clément, Mickaël Coustaty, Camille Kurtz, L. Wendling","doi":"10.1109/ICDAR.2017.57","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.57","url":null,"abstract":"This article focuses on the specific issue of drop caps image recognition in the context of cultural heritage preservation. Due to their heterogeneity and their weakly structured properties, these historical images represent challenging data. An important aspect in the recognition process of drop caps is their background styles, which can be considered as discriminative features to identify both the printer and the period. Most existing methods for style recognition are based on low-level features such as color or texture properties. In this article, we present a novel framework for the recognition of drop caps style based on features of higher levels. We propose to capture the spatial structure carried by these images using relative position descriptors modeling the enlacement between local cells of pixel layers obtained from a document segmentation step. Such descriptors are then exploited in an efficient bag-of-features learning procedure. Experimental results obtained on a dataset of historical drop caps images highlight the interest of this approach, and in particular the benefit of considering spatial information.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128735497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Nonlinear Manifold Embedding on Keyword Spotting Using t-SNE 基于t-SNE的非线性流形嵌入关键字定位
George Retsinas, N. Stamatopoulos, G. Louloudis, Giorgos Sfikas, B. Gatos
{"title":"Nonlinear Manifold Embedding on Keyword Spotting Using t-SNE","authors":"George Retsinas, N. Stamatopoulos, G. Louloudis, Giorgos Sfikas, B. Gatos","doi":"10.1109/ICDAR.2017.86","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.86","url":null,"abstract":"Nonlinear manifold embedding has attracted considerable attention due to its highly-desired property of efficiently encoding local structure, i.e. intrinsic space properties, into a low-dimensional space. The benefit of such an approach is twofold: it leads to compact representations while addressing the often-encountered curse of dimensionality. The latter plays an important role in retrieval applications, such as keyword spotting, where a sorted list of retrieved objects with respect to a distance metric is required. In this work, we explore the efficiency of the popular manifold embedding method t-distributed Stochastic Neighbor Embedding (t-SNE) on the Query-by-Example keyword spotting task. The main contribution of this work is the extension of t-SNE in order to support out-of-sample (OOS) embedding which is essential for mapping query images to the embedding space. The experimental results demonstrate a significant increase in keyword spotting performance when the word similarity is calculated on the embedding space.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645402","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition 基于gru的在线手写体数学表达式识别的注意编解码器方法
Jianshu Zhang, Jun Du, Lirong Dai
{"title":"A GRU-Based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition","authors":"Jianshu Zhang, Jun Du, Lirong Dai","doi":"10.1109/ICDAR.2017.152","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.152","url":null,"abstract":"In this study, we present a novel end-to-end approach based on the encoder-decoder framework with the attention mechanism for online handwritten mathematical expression recognition (OHMER). First, the input two-dimensional ink trajectory information of handwritten expression is encoded via the gated recurrent unit based recurrent neural network (GRU-RNN). Then the decoder is also implemented by the GRU-RNN with a coverage-based attention model. The proposed approach can simultaneously accomplish the symbol recognition and structural analysis to output a character sequence in LaTeX format. Validated on the CROHME 2014 competition task, our approach significantly outperforms the state-of-the-art with an expression recognition accuracy of 52.43% by only using the official training dataset. Furthermore, the alignments between the input trajectories of handwritten expressions and the output LaTeX sequences are visualized by the attention mechanism to show the effectiveness of the proposed method.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131027896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
Lexicographical-Based Order for Post-OCR Correction of Named Entities 命名实体后ocr更正的基于词典编纂的顺序
Axel Jean-Caurant, Nouredine Tamani, V. Courboulay, J. Burie
{"title":"Lexicographical-Based Order for Post-OCR Correction of Named Entities","authors":"Axel Jean-Caurant, Nouredine Tamani, V. Courboulay, J. Burie","doi":"10.1109/ICDAR.2017.197","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.197","url":null,"abstract":"We are in the era of information access in which a huge amount of text is extracted from scanned documents and made available digitally to be used in search processes. However, old or poorly scanned documents suffer from bad recognition, which leads to not only imperfect Optical Character Recognition (OCR), but to bad indexation and unattainable information, as well. To cope with the aforementioned issues, we introduce in this paper a lexicographical-based approach for Post-OCR correction applied to named entities. By combining lexicographically a contextual similarity and an edit distance, the approach builds a graph connecting similar named entities, in order to automatically correct the corresponding OCR processed text. We evaluated our approach on a generated dataset. The first results obtained showed that, despite the high level of degradation of the text, the approach succeeded in correcting more than a third of named entities without the need for any external knowledge.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"197 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123730181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluation of Texture Descriptors for Validation of Counterfeit Documents 用于伪造文件验证的纹理描述符评估
Albert Berenguel Centeno, O. R. Terrades, Josep Lladós Canet, Cristina Cañero Morales
{"title":"Evaluation of Texture Descriptors for Validation of Counterfeit Documents","authors":"Albert Berenguel Centeno, O. R. Terrades, Josep Lladós Canet, Cristina Cañero Morales","doi":"10.1109/ICDAR.2017.204","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.204","url":null,"abstract":"This paper describes an exhaustive comparative analysis and evaluation of different existing texture descriptor algorithms to differentiate between genuine and counterfeit documents. We include in our experiments different categories of algorithms and compare them in different scenarios with several counterfeit datasets, comprising banknotes and identity documents. Computational time in the extraction of each descriptor is important because the final objective is to use it in a real industrial scenario. HoG and CNN based descriptors stands out statistically over the rest in terms of the F1-score/time ratio performance.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130891934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Into the Colorful World of Webtoons: Through the Lens of Neural Networks 走进网络漫画的多彩世界:通过神经网络的镜头
Ceyda Cinarel, Byoung-Tak Zhang
{"title":"Into the Colorful World of Webtoons: Through the Lens of Neural Networks","authors":"Ceyda Cinarel, Byoung-Tak Zhang","doi":"10.1109/ICDAR.2017.289","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.289","url":null,"abstract":"The task of colorizing black and white images has previously been explored for natural images. In this paper we look at the task of colorization on a different domain: webtoons. To our knowledge this type of dataset hasn't been used before. Webtoons are usually produced in color thus they make a good dataset for analyzing different colorization models. Comics like webtoons also present some additional challenges over natural images, such as occlusion by speech bubbles and text. First we look at some of the previously introduced models' performance on this task and suggest modifications to address their problems. We propose a new model composed of two networks; one network generates sparse color information and a second network uses this generated color information as input to apply color to the whole image. These two networks are trained end-to-end. Our proposed model solves some of the problems observed with other architectures, resulting in better colorizations.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123206492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信