2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献

筛选
英文 中文
Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks 基于双向门控循环单元网络的表结构提取
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00220
Saqib Ali Khan, Syed Khalid, M. Shahzad, F. Shafait
{"title":"Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks","authors":"Saqib Ali Khan, Syed Khalid, M. Shahzad, F. Shafait","doi":"10.1109/ICDAR.2019.00220","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00220","url":null,"abstract":"Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-theart table structure extraction systems by a significant margin.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115176823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Offline Signature Verification using Structural Dynamic Time Warping 使用结构动态时间翘曲的离线签名验证
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00181
Michael Stauffer, Paul Maergner, Andreas Fischer, R. Ingold, Kaspar Riesen
{"title":"Offline Signature Verification using Structural Dynamic Time Warping","authors":"Michael Stauffer, Paul Maergner, Andreas Fischer, R. Ingold, Kaspar Riesen","doi":"10.1109/ICDAR.2019.00181","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00181","url":null,"abstract":"In recent years, different approaches for handwriting recognition that are based on graph representations have been proposed (e.g. graph-based keyword spotting or signature verification). This trend is mostly due to the availability of novel fast graph matching algorithms, as well as the inherent flexibility and expressivity of graph data structures when compared to vectorial representations. That is, graphs are able to directly adapt their size and structure to the size and complexity of the respective handwritten entities. However, the vast majority of the proposed approaches match the graphs from a global perspective only. In the present paper, we propose to match the underlying graphs from different local perspectives and combine the resulting assignments by means of Dynamic Time Warping. Moreover, we show that the proposed approach can be readily combined with global matchings. In an experimental evaluation, we employ the novel method in a signature verification scenario on two widely used benchmark datasets. On both datasets, we empirically confirm that the proposed approach outperforms state-of-the-art methods with respect to both accuracy and runtime.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115709617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition 离线手写化学式识别的端到端可训练系统
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00098
Xiaoxue Liu, Ting Zhang, Xinguo Yu
{"title":"An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition","authors":"Xiaoxue Liu, Ting Zhang, Xinguo Yu","doi":"10.1109/ICDAR.2019.00098","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00098","url":null,"abstract":"In this paper, we propose an end-to-end trainable system for recognizing handwritten chemical formulae. This system recognize once a time a chemical formula, instead of one chemical symbol or a whole chemical equation, which is in line with people's writing habits, at the same time could help to develop methods for the complicated chemical equations recognition. The proposed system adopts the CNN+RNN+CTC framework, which is one of state of the art methods in imagebased sequence labelling tasks. We extend the capability of the CNN+RNN+CTC framework to interpret 2D spatial relationships (such as 'subscript' existing in chemical formula) by introducing additional labels to represent them. The system evaluated on a self-collected data set of 12,224 samples, achieves the recognition rate of 94.98% at the chemical formula level.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127396884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture 基于自适应U-Net结构的历史文档图像文本线分割
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00066
Olfa Mechi, Maroua Mehri, R. Ingold, N. Amara
{"title":"Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture","authors":"Olfa Mechi, Maroua Mehri, R. Ingold, N. Amara","doi":"10.1109/ICDAR.2019.00066","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00066","url":null,"abstract":"On most document image transcription, indexing and retrieval systems, text line segmentation remains one of the most important preliminary task. Hence, the research community working in document image analysis is particularly interested in providing reliable text line segmentation methods. Recently, an increasing interest in using deep learning-based methods has been noted for solving various sub-fields and tasks related to the issues surrounding document image analysis. Thanks to the computer hardware and software evolution, several methods based on using deep architectures continue to outperform the pattern recognition issues and particularly those related to historical document image analysis. Thus, in this paper we present a novel deep learning-based method for text line segmentation of historical documents. The proposed method is based on using an adaptive U-Net architecture. Qualitative and numerical experiments are given using a large number of historical document images collected from the Tunisian national archives and different recent benchmarking datasets provided in the context of ICDAR and ICFHR competitions. Moreover, the results achieved are compared with those obtained using the state-of-the-art methods.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Multi-label Connectionist Temporal Classification 多标签联结时间分类
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00161
Curtis Wigington, Brian L. Price, Scott D. Cohen
{"title":"Multi-label Connectionist Temporal Classification","authors":"Curtis Wigington, Brian L. Price, Scott D. Cohen","doi":"10.1109/ICDAR.2019.00161","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00161","url":null,"abstract":"The Connectionist Temporal Classification (CTC) loss function [1] enables end-to-end training of a neural network for sequence-to-sequence tasks without the need for prior alignments between the input and output. CTC is traditionally used for training sequential, single-label problems; each element in the sequence has only one class. In this work, we show that CTC is not suitable for multi-label tasks and we present a novel Multi-label Connectionist Temporal Classification (MCTC) loss function for multi-label, sequence-to-sequence classification. Multi-label classes can represent meaningful attributes of a single element; for example, in Optical Music Recognition (OMR), a music note can have separate duration and pitch attributes. Our approach achieves state-of-the-art results on Joint Handwritten Text Recognition and Name Entity Recognition, Asian Character Recognition, and OMR.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125972365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images 化学结构识别(CSR)系统:文档图像中二维化学结构的自动分析
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00-41
S. S. Bukhari, Zaryab Iftikhar, A. Dengel
{"title":"Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images","authors":"S. S. Bukhari, Zaryab Iftikhar, A. Dengel","doi":"10.1109/ICDAR.2019.00-41","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00-41","url":null,"abstract":"In this era of advanced technology and automation, information extraction has become a very common practice for the analysis of data. A technique known as Optical Character Recognition (OCR) is used for recognition of text. The purpose is to extract textual data for automatic information analysis or natural language processing of document images. However, in the field of cheminformatics where it is required to recognize 2D molecular structures as they are published in research journals or patent documents, OCR is not adequate for processing, as chemical compounds can be represented both in textual as well as in graphical format. The digital representation of an image based chemical structure allows not only patent analysis teams to provide customize insights but also cheminformatic research groups to enhance their molecular structure databases, which further can be used for querying structure as well as sub-structural patterns. Some tools have been made for extraction and processing of image-based molecular structures. Optical Structure Recognition Application (OSRA) being one of the tools that partially fulfill the task of recognizing chemical structural in document images into chemical formats (SMILES, SDF, or MOL). However, it has few problems such as poor character recognition, false structure extraction, and slow processing. In this paper, we have developed a prototype Chemical Structure Recognition (CSR) system using modern and advanced image processing open-source libraries, which allows us to extract structural information of a chemical structure embedded in the form of a digital raster image. The CSR system is capable of processing chemical information contained in chemical structure image and generates the SMILES or MOL representation. For performance evaluation, we have used two different data sets to measure the potential of the CSR system. It yields better results than OSRA that depict accurate recognition, fast extraction, and correctness of great significance.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Multiple Comparative Attention Network for Offline Handwritten Chinese Character Recognition 面向离线手写体汉字识别的多重比较注意网络
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00101
Qingquan Xu, X. Bai, Wenyu Liu
{"title":"Multiple Comparative Attention Network for Offline Handwritten Chinese Character Recognition","authors":"Qingquan Xu, X. Bai, Wenyu Liu","doi":"10.1109/ICDAR.2019.00101","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00101","url":null,"abstract":"Recent advances in deep learning have made great progress in offline Handwritten Chinese Character Recognition (HCCR). However, most existing CNN-based methods only utilize global image features as contextual guidance to classify characters, while neglecting the local discriminative features which is very important for HCCR. To overcome this limitation, in this paper, we present a convolutional neural network with multiple comparative attention (MCANet) in order to produce separable local attention regions with discriminative feature across different categories. Concretely, our MCANet takes the last convolutional feature map as input and outputs multiple attention maps, a contrastive loss is used to restrict different attention selectively focus on different sub-regions. Moreover, we apply a region-level center loss to pull the features that learned from the same class and different regions closer to further obtain robust features invariant to large intra-class variance. Combining with classification loss, our method can learn which parts of images are relevant for recognizing characters and adaptively integrates information from different regions to make the final prediction. We conduct experiments on ICDAR2013 offline HCCR competition dataset with our proposed approach and achieves an accuracy of 97.66%, outperforming all single-network methods trained only on handwritten data.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126539694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification TextEdge:基于区域分割和边缘分类的多方向场景文本检测
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00067
Chen Du, Chunheng Wang, Yanna Wang, Zipeng Feng, Jiyuan Zhang
{"title":"TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification","authors":"Chen Du, Chunheng Wang, Yanna Wang, Zipeng Feng, Jiyuan Zhang","doi":"10.1109/ICDAR.2019.00067","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00067","url":null,"abstract":"The semantic-segmentation-based scene text detection algorithms always use the bounding-box regions or their shrinks to represent the text pixels. However, the non-text pixel information in these regions easily results in the poor performance of text detection, because these semantic segmentation methods need accurate pixel-level annotated training data to achieve approving performance and they are sensitive to noise and interference. In this work, we propose a fully convolutional network (FCN) based method termed TextEdge for multi-oriented scene text detection. Compared with previous methods simply using bounding-box regions as a segmentation mask, TextEdge introduces the text-region edge map as a new segmentation mask. Edge information is more representative for text areas and is proved to be effective in improving detection performance. TextEdge is optimized in an end-to-end way with multi-task outputs: text and non-text classification, text-edge prediction and the text boundaries regression. Experiments on standard datasets demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and efficiency. Specifically, it achieves an F-score of 0.88 on ICDAR 2013 dataset and 0.86 on ICDAR 2015 dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121651803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
KeyWord Spotting using Siamese Triplet Deep Neural Networks 基于Siamese三重态深度神经网络的关键词识别
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00187
Yasmine Serdouk, V. Eglin, S. Bres, Mylène Pardoen
{"title":"KeyWord Spotting using Siamese Triplet Deep Neural Networks","authors":"Yasmine Serdouk, V. Eglin, S. Bres, Mylène Pardoen","doi":"10.1109/ICDAR.2019.00187","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00187","url":null,"abstract":"Deep neural networks has shown great success in computer vision fields by achieving considerable state-of-the-art results and are beginning to arouse big interest in the document analysis community. In this paper, we present a novel siamese deep network of three inputs that allows retrieving the most similar words to a given query. The proposed system follows a query-by-example approach according to a segmentation-based technique and aims to learn suitable representations of handwritten word images, for which a simple Euclidean distance could perform the matching. The results obtained for the George Washington dataset show the potential and the effectiveness of the proposed keyword spotting system.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130443681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Relation Network Based Approach to Curved Text Detection 基于关系网络的曲线文本检测方法
2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00118
Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo
{"title":"A Relation Network Based Approach to Curved Text Detection","authors":"Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo","doi":"10.1109/ICDAR.2019.00118","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00118","url":null,"abstract":"In this paper, a new relation network based approach to curved text detection is proposed by formulating it as a visual relationship detection problem. The key idea is to decompose curved text detection into two subproblems, namely detection of text primitives and prediction of link relationship for each nearby text primitive pair. Specifically, an anchor-free region proposal network based text detector is first used to detect text primitives of different scales from different feature maps of a feature pyramid network, from which a manageable number of text primitive pairs are selected. Then, a relation network is used to predict whether each text primitive pair belongs to a same text instance. Finally, isolated text primitives are grouped into curved text instances based on link relationships of text primitive pairs. Because pairwise link prediction has used features extracted from the bounding boxes of each text primitive and their union, the relation network can effectively leverage wider context information to improve link prediction accuracy. Furthermore, since the link relationships of relatively distant text primitives can be predicted robustly, our relation network based text detector is capable of detecting text instances with large inter-character spaces. Consequently, our proposed approach achieves superior performance on not only two public curved text detection datasets, namely Total-Text and SCUT-CTW1500, but also a multi-oriented text detection dataset, namely MSRA-TD500.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116584918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信