2019 International Conference on Document Analysis and Recognition (ICDAR)最新文献_第8页

Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks 基于双向门控循环单元网络的表结构提取

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00220

Saqib Ali Khan, Syed Khalid, M. Shahzad, F. Shafait

{"title":"Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks","authors":"Saqib Ali Khan, Syed Khalid, M. Shahzad, F. Shafait","doi":"10.1109/ICDAR.2019.00220","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00220","url":null,"abstract":"Tables present summarized and structured information to the reader, which makes table's structure extraction an important part of document understanding applications. However, table structure identification is a hard problem not only because of the large variation in the table layouts and styles, but also owing to the variations in the page layouts and the noise contamination levels. A lot of research has been done to identify table structure, most of which is based on applying heuristics with the aid of optical character recognition (OCR) to hand pick layout features of the tables. These methods fail to generalize well because of the variations in the table layouts and the errors generated by OCR. In this paper, we have proposed a robust deep learning based approach to extract rows and columns from a detected table in document images with a high precision. In the proposed solution, the table images are first pre-processed and then fed to a bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation. The network scans the images from top-to-bottom as well as left-to-right and classifies each input as either a row-separator or a column-separator. We have benchmarked our system on publicly available UNLV as well as ICDAR 2013 datasets on which it outperformed the state-of-theart table structure extraction systems by a significant margin.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115176823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

Offline Signature Verification using Structural Dynamic Time Warping 使用结构动态时间翘曲的离线签名验证

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00181

Michael Stauffer, Paul Maergner, Andreas Fischer, R. Ingold, Kaspar Riesen

{"title":"Offline Signature Verification using Structural Dynamic Time Warping","authors":"Michael Stauffer, Paul Maergner, Andreas Fischer, R. Ingold, Kaspar Riesen","doi":"10.1109/ICDAR.2019.00181","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00181","url":null,"abstract":"In recent years, different approaches for handwriting recognition that are based on graph representations have been proposed (e.g. graph-based keyword spotting or signature verification). This trend is mostly due to the availability of novel fast graph matching algorithms, as well as the inherent flexibility and expressivity of graph data structures when compared to vectorial representations. That is, graphs are able to directly adapt their size and structure to the size and complexity of the respective handwritten entities. However, the vast majority of the proposed approaches match the graphs from a global perspective only. In the present paper, we propose to match the underlying graphs from different local perspectives and combine the resulting assignments by means of Dynamic Time Warping. Moreover, we show that the proposed approach can be readily combined with global matchings. In an experimental evaluation, we employ the novel method in a signature verification scenario on two widely used benchmark datasets. On both datasets, we empirically confirm that the proposed approach outperforms state-of-the-art methods with respect to both accuracy and runtime.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115709617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An End-to-End Trainable System for Offline Handwritten Chemical Formulae Recognition 离线手写化学式识别的端到端可训练系统

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00098

Xiaoxue Liu, Ting Zhang, Xinguo Yu

引用次数: 1

Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture 基于自适应U-Net结构的历史文档图像文本线分割

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00066

Olfa Mechi, Maroua Mehri, R. Ingold, N. Amara

{"title":"Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture","authors":"Olfa Mechi, Maroua Mehri, R. Ingold, N. Amara","doi":"10.1109/ICDAR.2019.00066","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00066","url":null,"abstract":"On most document image transcription, indexing and retrieval systems, text line segmentation remains one of the most important preliminary task. Hence, the research community working in document image analysis is particularly interested in providing reliable text line segmentation methods. Recently, an increasing interest in using deep learning-based methods has been noted for solving various sub-fields and tasks related to the issues surrounding document image analysis. Thanks to the computer hardware and software evolution, several methods based on using deep architectures continue to outperform the pattern recognition issues and particularly those related to historical document image analysis. Thus, in this paper we present a novel deep learning-based method for text line segmentation of historical documents. The proposed method is based on using an adaptive U-Net architecture. Qualitative and numerical experiments are given using a large number of historical document images collected from the Tunisian national archives and different recent benchmarking datasets provided in the context of ICDAR and ICFHR competitions. Moreover, the results achieved are compared with those obtained using the state-of-the-art methods.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

Multi-label Connectionist Temporal Classification 多标签联结时间分类

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00161

Curtis Wigington, Brian L. Price, Scott D. Cohen

引用次数: 11

Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images 化学结构识别(CSR)系统:文档图像中二维化学结构的自动分析

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00-41

S. S. Bukhari, Zaryab Iftikhar, A. Dengel

{"title":"Chemical Structure Recognition (CSR) System: Automatic Analysis of 2D Chemical Structures in Document Images","authors":"S. S. Bukhari, Zaryab Iftikhar, A. Dengel","doi":"10.1109/ICDAR.2019.00-41","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00-41","url":null,"abstract":"In this era of advanced technology and automation, information extraction has become a very common practice for the analysis of data. A technique known as Optical Character Recognition (OCR) is used for recognition of text. The purpose is to extract textual data for automatic information analysis or natural language processing of document images. However, in the field of cheminformatics where it is required to recognize 2D molecular structures as they are published in research journals or patent documents, OCR is not adequate for processing, as chemical compounds can be represented both in textual as well as in graphical format. The digital representation of an image based chemical structure allows not only patent analysis teams to provide customize insights but also cheminformatic research groups to enhance their molecular structure databases, which further can be used for querying structure as well as sub-structural patterns. Some tools have been made for extraction and processing of image-based molecular structures. Optical Structure Recognition Application (OSRA) being one of the tools that partially fulfill the task of recognizing chemical structural in document images into chemical formats (SMILES, SDF, or MOL). However, it has few problems such as poor character recognition, false structure extraction, and slow processing. In this paper, we have developed a prototype Chemical Structure Recognition (CSR) system using modern and advanced image processing open-source libraries, which allows us to extract structural information of a chemical structure embedded in the form of a digital raster image. The CSR system is capable of processing chemical information contained in chemical structure image and generates the SMILES or MOL representation. For performance evaluation, we have used two different data sets to measure the potential of the CSR system. It yields better results than OSRA that depict accurate recognition, fast extraction, and correctness of great significance.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126419361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Multiple Comparative Attention Network for Offline Handwritten Chinese Character Recognition 面向离线手写体汉字识别的多重比较注意网络

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00101

Qingquan Xu, X. Bai, Wenyu Liu

{"title":"Multiple Comparative Attention Network for Offline Handwritten Chinese Character Recognition","authors":"Qingquan Xu, X. Bai, Wenyu Liu","doi":"10.1109/ICDAR.2019.00101","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00101","url":null,"abstract":"Recent advances in deep learning have made great progress in offline Handwritten Chinese Character Recognition (HCCR). However, most existing CNN-based methods only utilize global image features as contextual guidance to classify characters, while neglecting the local discriminative features which is very important for HCCR. To overcome this limitation, in this paper, we present a convolutional neural network with multiple comparative attention (MCANet) in order to produce separable local attention regions with discriminative feature across different categories. Concretely, our MCANet takes the last convolutional feature map as input and outputs multiple attention maps, a contrastive loss is used to restrict different attention selectively focus on different sub-regions. Moreover, we apply a region-level center loss to pull the features that learned from the same class and different regions closer to further obtain robust features invariant to large intra-class variance. Combining with classification loss, our method can learn which parts of images are relevant for recognizing characters and adaptively integrates information from different regions to make the final prediction. We conduct experiments on ICDAR2013 offline HCCR competition dataset with our proposed approach and achieves an accuracy of 97.66%, outperforming all single-network methods trained only on handwritten data.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126539694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification TextEdge:基于区域分割和边缘分类的多方向场景文本检测

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00067

Chen Du, Chunheng Wang, Yanna Wang, Zipeng Feng, Jiyuan Zhang

{"title":"TextEdge: Multi-oriented Scene Text Detection via Region Segmentation and Edge Classification","authors":"Chen Du, Chunheng Wang, Yanna Wang, Zipeng Feng, Jiyuan Zhang","doi":"10.1109/ICDAR.2019.00067","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00067","url":null,"abstract":"The semantic-segmentation-based scene text detection algorithms always use the bounding-box regions or their shrinks to represent the text pixels. However, the non-text pixel information in these regions easily results in the poor performance of text detection, because these semantic segmentation methods need accurate pixel-level annotated training data to achieve approving performance and they are sensitive to noise and interference. In this work, we propose a fully convolutional network (FCN) based method termed TextEdge for multi-oriented scene text detection. Compared with previous methods simply using bounding-box regions as a segmentation mask, TextEdge introduces the text-region edge map as a new segmentation mask. Edge information is more representative for text areas and is proved to be effective in improving detection performance. TextEdge is optimized in an end-to-end way with multi-task outputs: text and non-text classification, text-edge prediction and the text boundaries regression. Experiments on standard datasets demonstrate that the proposed method achieves state-of-the-art performance in both accuracy and efficiency. Specifically, it achieves an F-score of 0.88 on ICDAR 2013 dataset and 0.86 on ICDAR 2015 dataset.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121651803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

KeyWord Spotting using Siamese Triplet Deep Neural Networks 基于Siamese三重态深度神经网络的关键词识别

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00187

Yasmine Serdouk, V. Eglin, S. Bres, Mylène Pardoen

引用次数: 5

A Relation Network Based Approach to Curved Text Detection 基于关系网络的曲线文本检测方法

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI: 10.1109/ICDAR.2019.00118

Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo

{"title":"A Relation Network Based Approach to Curved Text Detection","authors":"Chixiang Ma, Zhuoyao Zhong, Lei Sun, Qiang Huo","doi":"10.1109/ICDAR.2019.00118","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00118","url":null,"abstract":"In this paper, a new relation network based approach to curved text detection is proposed by formulating it as a visual relationship detection problem. The key idea is to decompose curved text detection into two subproblems, namely detection of text primitives and prediction of link relationship for each nearby text primitive pair. Specifically, an anchor-free region proposal network based text detector is first used to detect text primitives of different scales from different feature maps of a feature pyramid network, from which a manageable number of text primitive pairs are selected. Then, a relation network is used to predict whether each text primitive pair belongs to a same text instance. Finally, isolated text primitives are grouped into curved text instances based on link relationships of text primitive pairs. Because pairwise link prediction has used features extracted from the bounding boxes of each text primitive and their union, the relation network can effectively leverage wider context information to improve link prediction accuracy. Furthermore, since the link relationships of relatively distant text primitives can be predicted robustly, our relation network based text detector is capable of detecting text instances with large inter-character spaces. Consequently, our proposed approach achieves superior performance on not only two public curved text detection datasets, namely Total-Text and SCUT-CTW1500, but also a multi-oriented text detection dataset, namely MSRA-TD500.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116584918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8