2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
Segmentation-Free Speech Text Recognition for Comic Books 漫画书的无分割语音文本识别
Christophe Rigaud, J. Burie, J. Ogier
{"title":"Segmentation-Free Speech Text Recognition for Comic Books","authors":"Christophe Rigaud, J. Burie, J. Ogier","doi":"10.1109/ICDAR.2017.288","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.288","url":null,"abstract":"Speech text in comic books is written in a particular manner by the scriptwriter which raises unusual challenges for text recognition. We first detail these challenges and present different approaches to solve them. We compare the performances of pre-trained OCR and segmentation-free approach for speech text of comic books written in Latin script. We demonstrate that few good quality pre-trained OCR output samples, associated with other unlabeled data with the same writing style, can feed a segmentation-free OCR and improve text recognition. Thanks to the help of the lexicality measure that automatically accept or reject the pretrained OCR output as pseudo ground truth for a subsequent segmentation-free OCR training and recognition.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114734178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Preparatory KWS Experiments for Large-Scale Indexing of a Vast Medieval Manuscript Collection in the HIMANIS Project 在HIMANIS计划中大规模索引大量中世纪手稿集的预备KWS实验
Théodore Bluche, Sébastien Hamel, Christopher Kermorvant, J. Puigcerver, D. Stutzmann, A. Toselli, E. Vidal
{"title":"Preparatory KWS Experiments for Large-Scale Indexing of a Vast Medieval Manuscript Collection in the HIMANIS Project","authors":"Théodore Bluche, Sébastien Hamel, Christopher Kermorvant, J. Puigcerver, D. Stutzmann, A. Toselli, E. Vidal","doi":"10.1109/ICDAR.2017.59","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.59","url":null,"abstract":"Making large-scale collections of digitized historical documents searchable is being earnestly demanded by many archives and libraries. Probabilistically indexing the text images of these collections by means of keyword spotting techniques is currently seen as perhaps the only feasible approach to meet this demand. A vast medieval manuscript collection, written in both Latin and French, called \"Chancery\", is currently being considered for indexing at large. In addition to its bilingual nature, one of the major difficulties of this collection is the very high rate of abbreviated words which, on the other hand, are completely expanded in the ground truth transcripts available. In preparation to undertake full indexing of Chancery, experiments have been carried out on a relatively small but fully representative subset of this collection. To this end, a keyword spotting approach has been adopted which computes word relevance probabilities using character lattices produced by a recurrent neural network and a N-gram character language model. Results confirm the viability of the chosen approach for the large-scale indexing aimed at and show the ability of the proposed modeling and training approaches to properly deal with the abbreviation difficulties mentioned.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128033051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
A Spatial Domain Steganography for Grayscale Documents Using Pattern Recognition Techniques 基于模式识别技术的灰度文档空间域隐写
J. Burie, J. Ogier, Cu Vinh Loc
{"title":"A Spatial Domain Steganography for Grayscale Documents Using Pattern Recognition Techniques","authors":"J. Burie, J. Ogier, Cu Vinh Loc","doi":"10.1109/ICDAR.2017.391","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.391","url":null,"abstract":"Steganography is an effective way to hide a secret message into a document image with the objective of providing authenticity of transmitted documents. Steganography has been widely used for natural images but few researches have been carried out to apply this strategy on document images. In this study, we proposed a novel data hiding scheme that enables to embed a secret information with moderate length by taking advantages of pattern recognition techniques. Firstly, the potential feature points used for constructing embedding regions are identified by using the Speed Up Robust Features (SURF) detector. Secondly, Local Binary Pattern (LBP) is utilized to figure out embedding patterns inside each embedding region, Local Ternary Pattern (LTP) are then effectively exploited to locate the stable embedding positions inside embedding patterns in which the secret bits are embedded in. Finally, to make the scheme being robust against document rotation caused by distortion of printing and scanning process, Hough transform is applied to compute the rotation angle for restoring rotated document to original direction. Besides, repetition code and other improved methods are implemented to possibly enhance the accuracy of extracted secret data. The proposed steganography scheme in spatial domain is capable of detecting embedded data without any references and resisting to common image processing distortion.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116896170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
LSDE: Levenshtein Space Deep Embedding for Query-by-String Word Spotting 基于字符串查询的Levenshtein空间深度嵌入
L. G. I. Bigorda, Marçal Rusiñol, Dimosthenis Karatzas
{"title":"LSDE: Levenshtein Space Deep Embedding for Query-by-String Word Spotting","authors":"L. G. I. Bigorda, Marçal Rusiñol, Dimosthenis Karatzas","doi":"10.1109/ICDAR.2017.88","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.88","url":null,"abstract":"In this paper we present the LSDE string representation and its application to handwritten word spotting. LSDE is a novel embedding approach for representing strings that learns a space in which distances between projected points are correlated with the Levenshtein edit distance between the original strings. We show how such a representation produces a more semantically interpretable retrieval from the user's perspective than other state of the art ones such as PHOC and DCToW. We also conduct a preliminary handwritten word spotting experiment on the George Washington dataset.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115299725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? 手写文本识别真的需要多维循环层吗?
J. Puigcerver
{"title":"Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition?","authors":"J. Puigcerver","doi":"10.1109/ICDAR.2017.20","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.20","url":null,"abstract":"Current state-of-the-art approaches to offline Handwritten Text Recognition extensively rely on Multidimensional Long Short-Term Memory networks. However, these architectures come with quite an expensive computational cost, and we observe that they extract features visually similar to those of convolutional layers, which are computationally cheaper. This suggests that the two-dimensional long-term dependencies, which are potentially modeled by multidimensional recurrent layers, may not be essential to achieve a good recognition accuracy, at least in the lower layers of the architecture. In this work, an alternative model is explored that relies only on convolutional and one-dimensional recurrent layers that achieves better or equivalent results than those of the current state-of-the-art architecture, and runs significantly faster. In addition, we observe that using random distortions during training as synthetic data augmentation dramatically improves the accuracy of our model. Thus, are multidimensional recurrent layers really necessary for Handwritten Text Recognition? Probably not.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121057336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 205
Compact and Efficient WFST-Based Decoders for Handwriting Recognition 紧凑高效的基于wfst的手写体识别解码器
Meng Cai, Qiang Huo
{"title":"Compact and Efficient WFST-Based Decoders for Handwriting Recognition","authors":"Meng Cai, Qiang Huo","doi":"10.1109/ICDAR.2017.32","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.32","url":null,"abstract":"We present two weighted finite-state transducer (WFST) based decoders for handwriting recognition. One decoder is a cloud-based solution that is both compact and efficient. The other is a device-based solution that has a small memory footprint. A compact WFST data structure is proposed for the cloud-based decoder. There are no output labels stored on transitions of the compact WFST. A decoder based on the compact WFST data structure produces the same result with significantly less footprint compared with a decoder based on the corresponding standard WFST. For the device-based decoder, on-the-fly language model rescoring is performed to reduce footprint. Careful engineering methods, such as WFST weight quantization, token and data type refinement, are also explored. When using a language model containing 600,000 n-grams, the cloud-based decoder achieves an average decoding time of 4.04 ms per text line with a peak footprint of 114.4 MB, while the device-based decoder achieves an average decoding time of 13.47 ms per text line with a peak footprint of 31.6 MB.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127344024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Bag of Local Convolutional Triplets for Script Identification in Scene Text 基于局部卷积三联体的场景文本脚本识别
Jan Zdenek, Hideki Nakayama
{"title":"Bag of Local Convolutional Triplets for Script Identification in Scene Text","authors":"Jan Zdenek, Hideki Nakayama","doi":"10.1109/ICDAR.2017.68","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.68","url":null,"abstract":"The increasing interest in scene text reading in multilingual environments raises the need to recognize and distinguish between different writing systems. In this paper, we propose a novel method for script identification in scene text using triplets of local convolutional features in combination with the traditional bag-of-visual-words model. Feature triplets are created by making combinations of descriptors extracted from local patches of the input images using a convolutional neural network. This approach allows us to generate a more descriptive codeword dictionary for the bag-of-visual-words model, as the low discriminative power of weak descriptors is enhanced by other descriptors in a triplet. The proposed method is evaluated on two public benchmark datasets for scene text script identification and a public dataset for script identification in video captions. The experiments demonstrate that our method outperforms the baseline and yields competitive results on all three datasets.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124953907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Text Proposals Based on Windowed Maximally Stable Extremal Region for Scene Text Detection 基于窗口最大稳定极值区域的场景文本检测文本建议
Feng Su, Wenjun Ding, Lan Wang, Susu Shan, Hailiang Xu
{"title":"Text Proposals Based on Windowed Maximally Stable Extremal Region for Scene Text Detection","authors":"Feng Su, Wenjun Ding, Lan Wang, Susu Shan, Hailiang Xu","doi":"10.1109/ICDAR.2017.69","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.69","url":null,"abstract":"The generation of text proposals (i.e. local candidate regions most likely containing textual components) is one critical and prerequisite step in scene text detection task. As one popular text proposal algorithm, the Maximally Stable Extremal Region (MSER), has been exploited by many successful text detection methods, while on the other hand has difficulties in handling complicated scene text involving touching characters and characters composed of multiple unconnected parts (e.g. Chinese characters and text in dot matrix fonts). In this paper, we propose a novel text proposal method for localizing text in natural images, which integrates the MSER algorithm with the multi-scale sliding window framework and efficiently extracts Windowed Maximally Stable Extremal Regions (WMSERs) as text proposals. We further present effective proposal filtering and grouping algorithms for exploiting WMSER-based proposals in text detection task. Experiments on public scene text datasets demonstrate the promising aspects of the proposed method in dealing with complicated scene text.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125187096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ICDAR2017 Robust Reading Challenge on Omnidirectional Video ICDAR2017全向视频鲁棒阅读挑战
M. Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, L. G. I. Bigorda, Dimosthenis Karatzas
{"title":"ICDAR2017 Robust Reading Challenge on Omnidirectional Video","authors":"M. Iwamura, Naoyuki Morimoto, Keishi Tainaka, Dena Bazazian, L. G. I. Bigorda, Dimosthenis Karatzas","doi":"10.1109/ICDAR.2017.236","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.236","url":null,"abstract":"Results of ICDAR 2017 Robust Reading Challenge on Omnidirectional Video are presented. This competition uses Downtown Osaka Scene Text (DOST) Dataset that was captured in Osaka, Japan with an omnidirectional camera. Hence, it consists of sequential images (videos) of different view angles. Regarding the sequential images as videos (video mode), two tasks of localisation and end-to-end recognition are prepared. Regarding them as a set of still images (still image mode), three tasks of localisation, cropped word recognition and end-to-end recognition are prepared. As the dataset has been captured in Japan, the dataset contains Japanese text but also include text consisting of alphanumeric characters (Latin text). Hence, a submitted result for each task is evaluated in three ways: using Japanese only ground truth (GT), using Latin only GT and using combined GTs of both. Finally, by the submission deadline, we have received two submissions in the text localisation task of the still image mode. We intend to continue the competition in the open mode. Expecting further submissions, in this report we provide baseline results in all the tasks in addition to the submissions from the community.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126135475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Extremely Sparse Deep Learning Using Inception Modules with Dropfilters 极其稀疏的深度学习使用初始模块与Dropfilters
Woo-Young Kang, Kyung-Wha Park, Byoung-Tak Zhang
{"title":"Extremely Sparse Deep Learning Using Inception Modules with Dropfilters","authors":"Woo-Young Kang, Kyung-Wha Park, Byoung-Tak Zhang","doi":"10.1109/ICDAR.2017.80","DOIUrl":"https://doi.org/10.1109/ICDAR.2017.80","url":null,"abstract":"This paper reports a successful application of highly sparse convolutional network model for offline handwritten character recognition. The model makes use of spatial dropout techniques named dropfilters for sparsifying the inception modules in GoogLeNet, resulting in extremely sparse deep networks. The model is industry-deployable regarding model size and performance, which trained by a handwritten dataset of 520 classes and 260,000 Hangul(Korean) characters for tablet PCs and smartphones. The proposed model obtained significant improvement in recognition performance while the number of parameters is much smaller than that of the LeNet, a classical sparse convolutional network. We also evaluated the dropfiltered inception networks on the handwritten Hangul dataset and achieved 3.275% higher recognition accuracy with approximately three times fewer parameters than a deep network based on LeNet structure without dropfilters.","PeriodicalId":433676,"journal":{"name":"2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 2 Pt 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123401517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信