David Etter, Stephen Rawls, Cameron Carpenter, Gregory Sell
{"title":"A Synthetic Recipe for OCR","authors":"David Etter, Stephen Rawls, Cameron Carpenter, Gregory Sell","doi":"10.1109/ICDAR.2019.00143","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00143","url":null,"abstract":"Synthetic data generation for optical character recognition (OCR) promises unlimited training data at zero annotation cost. With enough fonts and seed text, we should be able to generate data to train a model that approaches or exceeds the performance with real annotated data. Unfortunately, this is not always the reality. Unconstrained image settings, such as internet memes, scanned web pages, or newspapers, present diverse scripts, fonts, layouts, and complex backgrounds, which cause models trained with synthetic data to break down. In this work, we investigate the synthetic image generation problem on a large multilingual set of unconstrained document images. Our work presents a comprehensive evaluation of the impact of synthetic data attributes on model performance. The results provide a recipe for synthetic data generation that will help guide future research.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127728273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, Chuanming Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
{"title":"ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT","authors":"Chee-Kheng Chng, Yuliang Liu, Yipeng Sun, Chun Chet Ng, Canjie Luo, Zihan Ni, Chuanming Fang, Shuaitao Zhang, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin","doi":"10.1109/ICDAR.2019.00252","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00252","url":null,"abstract":"This paper reports the ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text - RRC-ArT that consists of three major challenges: i) scene text detection, ii) scene text recognition, and iii) scene text spotting. A total of 78 submissions from 46 unique teams/individuals were received for this competition. The top performing score of each challenge is as follows: i) T1 - 82.65%, ii) T2.1 - 74.3%, iii) T2.2 - 85.32%, iv) T3.1 - 53.86%, and v) T3.2 - 54.91%. Apart from the results, this paper also details the ArT dataset, tasks description, evaluation metrics and participants' methods. The dataset, the evaluation kit as well as the results are publicly available at the challenge website.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133243687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Comparative Study of Attention-Based Encoder-Decoder Approaches to Natural Scene Text Recognition","authors":"Fu'ze Cong, Wenping Hu, Qiang Huo, Li Guo","doi":"10.1109/ICDAR.2019.00151","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00151","url":null,"abstract":"Attention-based encoder-decoder approaches have shown promising results in scene text recognition. In the literature, models with different encoders, decoders and attention mechanisms have been proposed and compared on isolated word recognition tasks, where the models are trained on either synthetic word images or a small set of real-world images. In this paper, we investigate different components of the attention based framework and compare its performance with a CNN-DBLSTM-CTC based approach on large-scale real-world scene text sentence recognition tasks. We train character models by using more than 1.6M real-world text lines and compare their performance on test sets collected from a variety of real-world scenarios. Our results show that (1) attention on a two-dimensional feature map can yield better performance than one-dimensional one and an RNN based decoder performs better than CNN based one; (2) attention-based approaches can achieve higher recognition accuracy than CNN-DBLSTM-CTC based approaches on isolated word recognition tasks, but perform worse on sentence recognition tasks; (3) it is more effective and efficient for CNN-DBLSTM-CTC based approaches to leverage an explicit language model to boost recognition accuracy.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133041894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chris Tensmeyer, Vlad I. Morariu, Brian L. Price, Scott D. Cohen, Tony R. Martinez
{"title":"Deep Splitting and Merging for Table Structure Decomposition","authors":"Chris Tensmeyer, Vlad I. Morariu, Brian L. Price, Scott D. Cohen, Tony R. Martinez","doi":"10.1109/ICDAR.2019.00027","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00027","url":null,"abstract":"Given the large variety and complexity of tables, table structure extraction is a challenging task in automated document analysis systems. We present a pair of novel deep learning models (Split and Merge models) that given an input image, 1) predicts the basic table grid pattern and 2) predicts which grid elements should be merged to recover cells that span multiple rows or columns. We propose projection pooling as a novel component of the Split model and grid pooling as a novel part of the Merge model. While most Fully Convolutional Networks rely on local evidence, these unique pooling regions allow our models to take advantage of the global table structure. We achieve state-of-the-art performance on the public ICDAR 2013 Table Competition dataset of PDF documents. On a much larger private dataset which we used to train the models, we significantly outperform both a state-ofthe-art deep model and a major commercial software system.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133812336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yann Soullard, Wassim Swaileh, Pierrick Tranouez, T. Paquet, Clément Chatelain
{"title":"Improving Text Recognition using Optical and Language Model Writer Adaptation","authors":"Yann Soullard, Wassim Swaileh, Pierrick Tranouez, T. Paquet, Clément Chatelain","doi":"10.1109/ICDAR.2019.00190","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00190","url":null,"abstract":"State-of-the-art methods for handwriting text recognition are based on deep learning approaches and language modeling that require large data sets during training. In practice, there are some applications where the system processes mono-writer documents, and would thus benefit from being trained on examples from that writer. However, this is not common to have numerous examples coming from just one writer. In this paper, we propose an approach to adapt both the optical model and the language model to a particular writer, from a generic system trained on large data sets with a variety of examples. We show the benefits of the optical and language model writer adaptation. Our approach reaches competitive results on the READ 2018 data set, which is dedicated to model adaptation to particular writers.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130365346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Free Line Detection in Manuscripts using Distance Transform Graph","authors":"M. Kassis, Jihad El-Sana","doi":"10.1109/ICDAR.2019.00044","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00044","url":null,"abstract":"We present a fully automated learning free method, for line detection in manuscripts. We begin by separating components that span over multiple lines, then we remove noise, and small connected components such as diacritics. We apply a distance transform on the image to create the image skeleton. The skeleton is pruned, its vertexes and edges are detected, in order to generate the initial document graph. We calculate the vertex v-score using its t-score and l-score quantifying its distance from being an absolute link in a line. In a greedy manner we classify each edge in the graph either a link, a bridge or a conflict edge. We merge every two edges classified as link together, then we merge the conflict edges next. Finally we remove the bridge edges from the graph generating the final form of the graph. Each edge in the graph equals to one extracted line. We applied the method on the DIVA-hisDB dataset on both public and private sections. The public section participated in the recently conducted Layout Analysis for Challenging Medieval Manuscripts Competition, and we have achieved results surpassing the vast majority of these systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115092749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multi-oriented Chinese Keyword Spotter Guided by Text Line Detection","authors":"Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju","doi":"10.1109/icdar.2019.00112","DOIUrl":"https://doi.org/10.1109/icdar.2019.00112","url":null,"abstract":"Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words. Different from English words which are split naturally by visual blanks, Chinese words are generally split only by semantic information. In this paper, we propose a new Chinese keyword spotter for natural images, which is inspired by Mask R-CNN. We propose to predict the keyword masks guided by text line detection. Firstly, proposals of text lines are generated by Faster R-CNN; Then, text line masks and keyword masks are predicted by segmentation in the proposals. In this way, the text lines and keywords are predicted in parallel. We create two Chinese keyword datasets based on RCTW-17 and ICPR MTWI2018 to verify the effectiveness of our method.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115228273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner
{"title":"A Genetic-Based Search for Adaptive Table Recognition in Spreadsheets","authors":"Elvis Koci, Maik Thiele, Oscar Romero, Wolfgang Lehner","doi":"10.1109/ICDAR.2019.00206","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00206","url":null,"abstract":"Spreadsheets are very successful content generation tools, used in almost every enterprise to create a wealth of information. However, this information is often intermingled with various formatting, layout, and textual metadata, making it hard to identify and interpret the tabular payload. Previous works proposed to solve this problem by mainly using heuristics. Although fast to implement, these approaches fail to capture the high variability of user-generated spreadsheet tables. Therefore, in this paper, we propose a supervised approach that is able to adapt to arbitrary spreadsheet datasets. We use a graph model to represent the contents of a sheet, which carries layout and spatial features. Subsequently, we apply genetic-based approaches for graph partitioning, to recognize the parts of the graph corresponding to tables in the sheet. The search for tables is guided by an objective function, which is tuned to match the specific characteristics of a given dataset. We present the feasibility of this approach with an experimental evaluation, on a large, real-world spreadsheet corpus.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115633820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shubham Paliwal, D. Vishwanath, R. Rahul, Monika Sharma, L. Vig
{"title":"TableNet: Deep Learning Model for End-to-end Table Detection and Tabular Data Extraction from Scanned Document Images","authors":"Shubham Paliwal, D. Vishwanath, R. Rahul, Monika Sharma, L. Vig","doi":"10.1109/ICDAR.2019.00029","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00029","url":null,"abstract":"With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114382703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blind Source Separation Based Framework for Multispectral Document Images Binarization","authors":"Abderrahmane Rahiche, A. Bakhta, M. Cheriet","doi":"10.1109/ICDAR.2019.00237","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00237","url":null,"abstract":"In this paper, we propose a novel Blind Source Separation (BSS) based framework for multispectral (MS) document images binarization. This framework takes advantage of the multidimensional data representation of MS images and makes use of the Graph regularized Non-negative Matrix Factorization (GNMF) to decompose MS document images into their different constituting components, i.e., foreground (text, ink), background (paper, parchment), degradation information, etc. The proposed framework is validated on two different real-world data sets of manuscript images showing a high capability of dealing with: variable numbers of bands regardless of the acquisition protocol, different types of degradations, and illumination non-uniformity while outperforming the results reported in the state-of-the-art. Although the focus was put on the binary separation (i.e., foreground/background), the proposed framework is also used for the decomposition of document images into different components, i.e., background, text, and degradation, which allows full sources separation, whereby further analysis and characterization of each component can be possible. A comparative study is performed using Independent Component Analysis (ICA) and Principal Component Analysis (PCA) methods. Our framework is also validated on another third dataset of MS images of natural objects to demonstrate its generalizability beyond document samples.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114442377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}