2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
Hybrid Training Data for Historical Text OCR 历史文本OCR混合训练数据
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00096
J. Martínek, Ladislav Lenc, P. Král, Anguelos Nicolaou, V. Christlein
{"title":"Hybrid Training Data for Historical Text OCR","authors":"J. Martínek, Ladislav Lenc, P. Král, Anguelos Nicolaou, V. Christlein","doi":"10.1109/ICDAR.2019.00096","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00096","url":null,"abstract":"Current optical character recognition (OCR) systems commonly make use of recurrent neural networks (RNN) that process whole text lines. Such systems avoid the task of character segmentation necessary for character-based approaches. A disadvantage of this approach is a need of a large amount of annotated data. This can be solved by sing generated synthetic data instead of costly manually annotated ones. Unfortunately, such data is often not suitable for historical documents particularly for quality reasons. This work presents a hybrid approach for generating annotated data for OCR at a low cost. We first collect a small dataset of isolated characters from historical document images. Then, we generate historical looking text lines from the generated characters. Another contribution lies in the design and implementation of an OCR system based on a convolutional-LSTM network. We first pre-train this system on hybrid data. Afterwards, the network is fine-tuned with real printed text lines. We demonstrate that this training strategy is efficient for obtaining state-of-the-art results. We also show that the score of the proposed system is comparable or even better in comparison to several state-of-the-art systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122076164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Cascaded Detail-Preserving Networks for Super-Resolution of Document Images 用于文档图像超分辨率的级联细节保留网络
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00047
Zhichao Fu, Yu Kong, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He
{"title":"Cascaded Detail-Preserving Networks for Super-Resolution of Document Images","authors":"Zhichao Fu, Yu Kong, Yingbin Zheng, Hao Ye, Wenxin Hu, Jing Yang, Liang He","doi":"10.1109/ICDAR.2019.00047","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00047","url":null,"abstract":"The accuracy of OCR is usually affected by the quality of the input document image and different kinds of marred document images hamper the OCR results. Among these scenarios, the low-resolution image is a common and challenging case. In this paper, we propose the cascaded networks for document image super-resolution. Our model is composed by the Detail-Preserving Networks with small magnification. The loss function with perceptual terms is designed to simultaneously preserve the original patterns and enhance the edge of the characters. These networks are trained with the same architecture and different parameters and then assembled into a pipeline model with a larger magnification. The low-resolution images can upscale gradually by passing through each Detail-Preserving Network until the final high-resolution images. Through extensive experiments on two scanning document image datasets, we demonstrate that the proposed approach outperforms recent state-of-the-art image super-resolution methods, and combining it with standard OCR system lead to signification improvements on the recognition results.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116834359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Detecting Named Entities in Unstructured Bengali Manuscript Images 检测非结构化孟加拉语手稿图像中的命名实体
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00040
Chandranath Adak, B. Chaudhuri, Chin-Teng Lin, M. Blumenstein
{"title":"Detecting Named Entities in Unstructured Bengali Manuscript Images","authors":"Chandranath Adak, B. Chaudhuri, Chin-Teng Lin, M. Blumenstein","doi":"10.1109/ICDAR.2019.00040","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00040","url":null,"abstract":"In this paper, we undertake a task to find named entities directly from unstructured handwritten document images without any intermediate text/character recognition. Here, we do not receive any assistance from natural language processing. Therefore, it becomes more challenging to detect the named entities. We work on Bengali script which brings some additional hurdles due to its own unique script characteristics. Here, we propose a new deep neural network-based architecture to extract the latent features from a text image. The embedding is then fed to a BLSTM (Bidirectional Long Short-Term Memory) layer. After that, the attention mechanism is adapted to an approach for named entity detection. We perform experimentation on two publicly-available offline handwriting repositories containing 420 Bengali handwritten pages in total. The experimental outcome of our system is quite impressive as it attains 95.43% balanced accuracy on overall named entity detection.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129400458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Binarization of Degraded Document Images using Convolutional Neural Networks Based on Predicted Two-Channel Images 基于预测双通道图像的卷积神经网络退化文档图像二值化
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00160
Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira
{"title":"Binarization of Degraded Document Images using Convolutional Neural Networks Based on Predicted Two-Channel Images","authors":"Y. Akbari, A. Britto, S. Al-Maadeed, Luiz Oliveira","doi":"10.1109/ICDAR.2019.00160","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00160","url":null,"abstract":"Due to the poor condition of most of historical documents, binarization is difficult to separate document image background pixels from foreground pixels. This paper proposes Convolutional Neural Networks (CNNs) based on predicted two-channel images in which CNNs are trained to classify the foreground pixels. The promising results from the use of multispectral images for semantic segmentation inspired our efforts to create a novel prediction-based two-channel image. In our method, the original image is binarized by the structural symmetric pixels (SSPs) method, and the two-channel image is constructed from the original image and its binarized image. In order to explore impact of proposed two-channel images as network inputs, we use two popular CNNs architectures, namely SegNet and U-net. The results presented in this work show that our approach fully outperforms SegNet and U-net when trained by the original images and demonstrates competitiveness and robustness compared with state-of-the-art results using the DIBCO database.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129331034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
ReS2TIM: Reconstruct Syntactic Structures from Table Images ReS2TIM:从表图像中重建语法结构
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00125
Wenyuan Xue, Qingyong Li, D. Tao
{"title":"ReS2TIM: Reconstruct Syntactic Structures from Table Images","authors":"Wenyuan Xue, Qingyong Li, D. Tao","doi":"10.1109/ICDAR.2019.00125","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00125","url":null,"abstract":"Tables often represent densely packed but structured data. Understanding table semantics is vital for effective information retrieval and data mining. Unlike web tables, whose semantics are readable directly from markup language and contents, the full analysis of tables published as images requires the conversion of discrete data into structured information. This paper presents a novel framework to convert a table image into its syntactic representation through the relationships between its cells. In order to reconstruct the syntactic structures of a table, we build a cell relationship network to predict the neighbors of each cell in four directions. During the training stage, a distance-based sample weight is proposed to handle the class imbalance problem. According to the detected relationships, the table is represented by a weighted graph that is then employed to infer the basic syntactic table structure. Experimental evaluation of the proposed framework using two datasets demonstrates the effectiveness of our model for cell relationship detection and table structure inference.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127120199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 27
A New Document Image Quality Assessment Method Based on Hast Derivations 一种基于哈斯特导数的文档图像质量评估新方法
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00201
Alireza Alaei
{"title":"A New Document Image Quality Assessment Method Based on Hast Derivations","authors":"Alireza Alaei","doi":"10.1109/ICDAR.2019.00201","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00201","url":null,"abstract":"With the rapid emergence of new technologies, a voluminous number of images including document images is generated every day. Considering the volume of data and complexity of processes, manual analysis, annotation, recognition, classification, and retrieval, of such document images is impossible. To automatically deal with such processes, many document image analysis applications exist in the literature and many of them are currently in place in different organisation and institutes. The performance of those applications are directly affected by the quality of document images. Therefore, a document image quality assessment (DIQA) method is of primary need to allow users capture, compress and forward good quality (readable) document images to various information systems, such as online business and insurance, for further processing. To assess the quality of document images, this paper proposes a new full-reference DIQA method using first followed by second order Hast derivations. A similarity map is then created using second order Hast derivation maps obtained by employing Hast filters on both reference and distorted images. An average pooling is then employed to obtain a quality score for the distorted document image. To evaluate the proposed method, two different datasets were used. Both datasets are composed of images with the mean human opinion scores (MHOS) considered as ground truth. The results obtained from the proposed DIQA method are superior to the results reported in the literature.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125687967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Layout Analysis on Challenging Historical Arabic Manuscripts using Siamese Network 利用Siamese网络分析阿拉伯历史手稿的布局
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00123
Reem Alaasam, Berat Kurar, Jihad El-Sana
{"title":"Layout Analysis on Challenging Historical Arabic Manuscripts using Siamese Network","authors":"Reem Alaasam, Berat Kurar, Jihad El-Sana","doi":"10.1109/ICDAR.2019.00123","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00123","url":null,"abstract":"This paper presents layout analysis for historical Arabic documents using siamese network. Given pages from different documents, we divide them into patches of similar sizes. We train a siamese network model that takes as an input a pair of patches and gives as an output a distance that corresponds to the similarity between the two patches. We used the trained model to calculate a distance matrix which in turn is used to cluster the patches of a page as either main text, side text or a background patch. We evaluate our method on challenging historical Arabic manuscripts dataset and report the F-measure. We show the effectiveness of our method by comparing with other works that use deep learning approaches, and show that we have state of art results.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131716758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Cross-Modal Prototype Learning for Zero-Shot Handwriting Recognition 零射击手写识别的跨模态原型学习
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00100
Xiang Ao, Xu-Yao Zhang, Hong-Ming Yang, Fei Yin, Cheng-Lin Liu
{"title":"Cross-Modal Prototype Learning for Zero-Shot Handwriting Recognition","authors":"Xiang Ao, Xu-Yao Zhang, Hong-Ming Yang, Fei Yin, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00100","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00100","url":null,"abstract":"In contrast to machine recognizers that rely on training with large handwriting data, humans can recognize handwriting accurately on learning from few samples, and can even generalize to handwritten characters from printed samples. Simulating this ability in machine recognition is important to alleviate the burden of labeling large handwriting data, especially for large category set as in Chinese text. In this paper, inspired by human learning, we propose a cross-modal prototype learning (CMPL) method for zero-shot online handwritten character recognition: for unseen categories, handwritten characters can be recognized without learning from handwritten samples, but instead from printed characters. Particularly, the printed characters (one for each class) are embedded into a convolutional neural network (CNN) feature space to obtain prototypes representing each class, while the online handwriting trajectories are embedded with a recurrent neural network (RNN). Via cross-modal joint learning, handwritten characters can be recognized according to the printed prototypes. For unseen categories, handwritten characters can be recognized by only feeding a printed sample per category. Experiments on a benchmark Chinese handwriting database have shown the effectiveness and potential of the proposed method for zero-shot handwriting recognition.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123414889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
An End-to-End Trainable Framework for Joint Optimization of Document Enhancement and Recognition 文档增强和识别联合优化的端到端可训练框架
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00019
Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini
{"title":"An End-to-End Trainable Framework for Joint Optimization of Document Enhancement and Recognition","authors":"Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini","doi":"10.1109/ICDAR.2019.00019","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00019","url":null,"abstract":"Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126097506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Document Binarization via Multi-resolutional Attention Model with DRD Loss 基于DRD损失的多分辨率注意力模型的文档二值化
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00017
Xujun Peng, Chao Wang, Huaigu Cao
{"title":"Document Binarization via Multi-resolutional Attention Model with DRD Loss","authors":"Xujun Peng, Chao Wang, Huaigu Cao","doi":"10.1109/ICDAR.2019.00017","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00017","url":null,"abstract":"Document binarization which separates text from background is a critical pre-processing step for many high level document analysis tasks. Conventional document binarization approaches tend to use hand-craft features and empirical rules to simulate the degradation process of document image and accomplish the binarization task. In this paper, we propose a deep learning framework where the probability of text areas is inferred through a multi-resolutional attention model, which is consequently fed into a convolutional conditional random field (ConvCRF) to obtain the final binarized document image. In the proposed approach, the features of degraded document image are learned by neural networks and the relations between text areas and backgrounds are inferred by ConvCRF, which avoids the dependence of domain knowledge from researchers and has more generalization capabilities. The experimental results on public datasets show that the proposed method has superior binarization performance than the existing state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122660960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信