B. Belay, T. Habtegebrial, M. Liwicki, Gebeyehu Belay, D. Stricker
{"title":"Amharic Text Image Recognition: Database, Algorithm, and Analysis","authors":"B. Belay, T. Habtegebrial, M. Liwicki, Gebeyehu Belay, D. Stricker","doi":"10.1109/ICDAR.2019.00205","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00205","url":null,"abstract":"This paper introduces a dataset for an exotic, but very interesting script, Amharic. Amharic follows a unique syllabic writing system which uses 33 consonant characters with their 7 vowels variants of each. Some labialized characters derived by adding diacritical marks on consonants and or removing part of it. These associated diacritics on consonant characters are relatively smaller in size and challenging to distinguish the derived (vowel and labialized) characters. In this paper we tackle the problem of Amharic text-line image recognition. In this work, we propose a recurrent neural network based method to recognize Amharic text-line images. The proposed method uses Long Short Term Memory (LSTM) networks together with CTC (Connectionist Temporal Classification). Furthermore, in order to overcome the lack of annotated data, we introduce a new dataset that contains 337,332 Amharic text-line images which is made freely available at http://www.dfki.uni-kl.de/~belay/. The performance of the proposed Amharic OCR model is tested by both printed and synthetically generated datasets, and promising results are obtained.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"426 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131813089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Lins, E. Kavallieratou, E. B. Smith, R. Bernardino, D. Jesus
{"title":"ICDAR 2019 Time-Quality Binarization Competition","authors":"R. Lins, E. Kavallieratou, E. B. Smith, R. Bernardino, D. Jesus","doi":"10.1109/ICDAR.2019.00248","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00248","url":null,"abstract":"The ICDAR 2019 Time-Quality Binarization Competition assessed the performance of seventeen new together with thirty previously published binarization algorithms. The quality of the resulting two-tone image and the execution time were assessed. Comparisons were on both in \"real-world\" and synthetic scanned images, and in documents photographed with four models of widely used portable phones. Most of the submitted algorithms employed machine learning techniques and performed best on the most complex images. Traditional algorithms provided very good results at a fraction of the time.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132111215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, Wolfgang Lehner
{"title":"DECO: A Dataset of Annotated Spreadsheets for Layout and Table Recognition","authors":"Elvis Koci, Maik Thiele, Josephine Rehak, Oscar Romero, Wolfgang Lehner","doi":"10.1109/ICDAR.2019.00207","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00207","url":null,"abstract":"This paper presents DECO (Dresden Enron COrpus), a dataset of spreadsheet files, annotated on the basis of layout and contents. It comprises of 1,165 files, extracted from the Enron corpus. Three different annotators (judges) assigned layout roles (e.g., Header, Data, and Notes) to non-empty cells and marked the borders of tables. Files that do not contain tables were flagged using categories such as Template, Form, and Report. Subsequently, a thorough analysis is performed to uncover the characteristics of the overall dataset and specific annotations. The results are discussed in this paper, providing several takeaways for future works. Furthermore, this work describes in detail the annotation methodology, going through the individual steps. The dataset, methodology, and tools are made publicly available, so that they can be adopted for further studies. DECO is available at: https://wwwdb.inf.tu-dresden.de/research-projects/deexcelarator/","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132183761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yao Xiao, Minglong Xue, Tong Lu, Yirui Wu, P. Shivakumara
{"title":"A Text-Context-Aware CNN Network for Multi-oriented and Multi-language Scene Text Detection","authors":"Yao Xiao, Minglong Xue, Tong Lu, Yirui Wu, P. Shivakumara","doi":"10.1109/ICDAR.2019.00116","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00116","url":null,"abstract":"The existing deep learning based state-of-theart scene text detection methods treat scene texts a type of general objects, or segment text regions directly. The latter category achieves remarkable detection results on arbitraryorientation and large aspect ratios of scene texts based on instance segmentation algorithms. However, due to the lack of context information with consideration of scene text unique characteristics, directly applying instance segmentation to text detection task is prone to result in low accuracy, especially producing false positive detection results. To ease this problem, we propose a novel text-context-aware scene text detection CNN structure, which appropriately encodes channel and spatial attention information to construct context-aware and discriminative feature map for multi-oriented and multi-language text detection tasks. With high representation ability of textcontext-aware feature map, the proposed instance segmentation based method can not only robustly detect multi-oriented and multi-language text from natural scene images, but also produce better text detection results by greatly reducing false positives. Experiments on ICDAR2015 and ICDAR2017-MLT datasets show that the proposed method has achieved superior performances in precision, recall and F-measure than most of the existing studies.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128283920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation","authors":"Adolfo Santoro, A. Marcelli","doi":"10.1109/ICDAR.2019.00198","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00198","url":null,"abstract":"We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134086431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hiding Security Feature Into Text Content for Securing Documents Using Generated Font","authors":"Vinh Loc Cu, J. Burie, J. Ogier, Cheng-Lin Liu","doi":"10.1109/ICDAR.2019.00196","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00196","url":null,"abstract":"Motivated by increasing possibility of the tampering of genuine documents during a transmission over digital channels, we focus on developing a watermarking framework for determining whether a given document is genuine or falsified. The proposed framework is performed by hiding a security feature or secret information within the document. In order to hide the security feature, we replace the appropriate characters of legal document by the equivalent characters coming from generated fonts, called hereafter the variations of characters. These variations are produced by training generative adversarial networks (GAN) with the features of character's skeleton and normal shape. Regarding the process of detecting hidden information, we make use of fully convolutional networks (FCN) to produce salient regions from the watermarked document. The salient regions mark positions of document where the characters are substituted by their variations, and these positions are used as a reference for extracting the hidden information. Lastly, we demonstrate that our approach gives high precision of data detection, and competitive performance compared to state-of-the-art approaches.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134109919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin
{"title":"ICDAR 2019 Competition on Large-Scale Street View Text with Partial Labeling - RRC-LSVT","authors":"Yipeng Sun, Zihan Ni, Chee-Kheng Chng, Yuliang Liu, Canjie Luo, Chun Chet Ng, Junyu Han, Errui Ding, Jingtuo Liu, Dimosthenis Karatzas, Chee Seng Chan, Lianwen Jin","doi":"10.1109/icdar.2019.00250","DOIUrl":"https://doi.org/10.1109/icdar.2019.00250","url":null,"abstract":"Robust text reading from street view images provides valuable information for various applications. Performance improvement of existing methods in such a challenging scenario heavily relies on the amount of fully annotated training data, which is costly and in-efficient to obtain. To scale up the amount of training data while keeping the labeling procedure cost-effective, this competition introduces a new challenge on Large-scale Street View Text with Partial Labeling (LSVT), providing 5,0000 and 400,000 images in full and weak annotations, respectively. This competition aims to explore the abilities of state-of-the-art methods to detect and recognize text instances from large-scale street view images, closing gaps between research benchmarks and real applications. During the competition period, a total number of 41 teams participate in the two tasks with 132 valid submissions, i.e., text detection and end-to-end text spotting. This paper includes dataset descriptions, task definitions, evaluation protocols and results summaries of ICDAR 2019-LSVT challenge.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133219525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Welcome Message from the Program Chairs","authors":"G. Feigin","doi":"10.1109/icdar.2019.00007","DOIUrl":"https://doi.org/10.1109/icdar.2019.00007","url":null,"abstract":"Welcome to Vancouver and the 20 Annual Conference of the Production and Operations Management Society! We received 988 abstracts for this conference. These submissions have been clustered into 19 tracks across the entire spectrum of Operations Management. In keeping with the theme of the conference, one of these tracks is titled “Operations in Emerging Economies” and features presentations from researchers in various countries. We would like to thank all the track chairs for their hard work in soliciting speakers and helping to put the program together. The track chairs are:","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116653393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Handwriting Recognition Based on Temporal Order Restored by the End-to-End System","authors":"Besma Rabhi, A. Elbaati, Y. Hamdi, A. Alimi","doi":"10.1109/ICDAR.2019.00199","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00199","url":null,"abstract":"In this paper, we present an original framework for offline handwriting recognition. Our developed recognition system is based on Sequence to Sequence model employing the encoder decoder LSTM, for recovering temporal order from offline handwriting. Handwriting temporal recovery consists of two parts which are respectively extracting features using a Convolution Neural Network (CNN) followed by an LSTM layer and decoding the encoded vectors to generate temporal information using BLSTM. To produce a human-like velocity, we make a Sampling operation by the consideration of trajectory curvatures. Our work is validated by the LSTM recognition system based on Beta Elliptic model that is applied on Arabic and Latin On/Off dual handwriting character database.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122178974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Monica Haurilet, Alina Roitberg, Manuel Martínez, R. Stiefelhagen
{"title":"WiSe — Slide Segmentation in the Wild","authors":"Monica Haurilet, Alina Roitberg, Manuel Martínez, R. Stiefelhagen","doi":"10.1109/ICDAR.2019.00062","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00062","url":null,"abstract":"We address the task of segmenting presentation slides, where the examined page was captured as a live photo during lectures. Slides are important document types used as visual components accompanying presentations in a variety of fields ranging from education to business. However, automatic analysis of presentation slides has not been researched sufficiently, and, so far, only preprocessed images of already digitalized slide documents were considered. We aim to introduce the task of analyzing unconstrained photos of slides taken during lectures and present a novel dataset for Page Segmentation with slides captured in the Wild (WiSe). Our dataset covers pixel-wise annotations of 25 classes on 1300 pages, allowing overlapping regions (i.e., multi-class assignments). To evaluate the performance, we define multiple benchmark metrics and baseline methods for our dataset. We further implement two different deep neural network approaches previously used for segmenting natural images and adopt them for the task. Our evaluation results demonstrate the effectiveness of the deep learning-based methods, surpassing the baseline methods by over 30%. To foster further research of slide analysis in unconstrained photos, we make the WiSe dataset publicly available to the community.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122270756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}