{"title":"Care Label Recognition","authors":"Jiri Kralicek, Jiri Matas, M. Busta","doi":"10.1109/ICDAR.2019.00158","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00158","url":null,"abstract":"The paper introduces the problem of care label recognition and presents a method addressing it. A care label, also called a care tag, is a small piece of cloth or paper attached to a garment providing instructions for its maintenance and information about e.g. the material and size. The informationand instructions are written as symbols or plain text. Care label recognition is a challenging text and pictogram recognition problem - the often sewn text is small, looking as if printed using a non-standard font; the contrast of the text gradually fades, making OCR progressively more difficult. On the other hand, the information provided is typically redundant and thus it facilitates semi-supervised learning. The presented care label recognition method is based on the recently published End-to-End Method for Multi-LanguageScene Text, E2E-MLT, Busta et al. 2018, exploiting specific constraints, e.g. a care label vocabulary with multi-language equivalences. Experiments conducted on a newly-created dataset of 63 care label images show that even when exploiting problem-specific constraints, a state-of-the-art scene text detection and recognition method achieve precision and recall slightly above 0.6, confirming the challenging nature of the problem.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134355075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Welcome Message from the Honorary Chair","authors":"H. Makino","doi":"10.1109/icdar.2019.00005","DOIUrl":"https://doi.org/10.1109/icdar.2019.00005","url":null,"abstract":"","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131569652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Hamdi, H. Boubaker, Thameur Dhieb, A. Elbaati, A. Alimi
{"title":"Hybrid DBLSTM-SVM Based Beta-Elliptic-CNN Models for Online Arabic Characters Recognition","authors":"Y. Hamdi, H. Boubaker, Thameur Dhieb, A. Elbaati, A. Alimi","doi":"10.1109/ICDAR.2019.00093","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00093","url":null,"abstract":"The deep learning-based approaches have proven highly successful in handwriting recognition which represents a challenging task that satisfies its increasingly broad application in mobile devices. Recently, several research initiatives in the area of pattern recognition studies have been introduced. The challenge is more earnest for Arabic scripts due to the inherent cursiveness of their characters, the existence of several groups of similar shape characters, large sizes of respective alphabets, etc. In this paper, we propose an online Arabic character recognition system based on hybrid Beta-Elliptic model (BEM) and convolutional neural network (CNN) feature extractor models and combining deep bidirectional long short-term memory (DBLSTM) and support vector machine (SVM) classifiers. First, we use the extracted online and offline features to make the classification and compare the performance of single classifiers. Second, we proceed by combining the two types of feature-based systems using different combination methods to enhance the global system discriminating power. We have evaluated our system using LMCA and Online-KHATT databases. The obtained recognition rate is in a maximum of 95.48% and 91.55% for the individual systems using the two databases respectively. The combination of the on-line and off-line systems allows improving the accuracy rate to 99.11% and 93.98% using the same databases which exceed the best result for other state-of-the-art systems.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131800897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data Augmentation via Adversarial Networks for Optical Character Recognition/Conference Submissions","authors":"Victor Storchan","doi":"10.1109/ICDAR.2019.00038","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00038","url":null,"abstract":"With the ongoing digitalization of ressources across the industry, robust OCR solutions (Optical Character Recognition) are highly valuable. In this work, we aim at designing models to read typical damaged faxes and PDF files and training them with unlabeled data. State-of-art deep learning architectures require scalable tagged datasets that are often difficult and costly to collect. To ensure compliance standards or to provide reproducible cheap and fast solutions for training OCR systems, producing datasets that mimic the quality of the data that will be passed to the model is paramount. In this paper we discuss using unsupervised image-to-image translation methods to learn transformations that aim to map clean images of words to damaged images of words. The quality of the transformation is evaluated through the OCR brick and these results are compared to the Inception Score (IS) of the GANs we used. That way we are able to generate an arbitrary large realistic dataset without labeling a single observation. As a result, we propose an end-to-end OCR training solution to provide competitive models.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131198391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content Extraction from Lecture Video via Speaker Action Classification Based on Pose Information","authors":"Fei Xu, Kenny Davila, S. Setlur, V. Govindaraju","doi":"10.1109/ICDAR.2019.00171","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00171","url":null,"abstract":"Online lecture videos are increasingly important e-learning materials for students. Automated content extraction from lecture videos facilitates information retrieval applications that improve access to the lecture material. A significant number of lecture videos include the speaker in the image. Speakers perform various semantically meaningful actions during the process of teaching. Among all the movements of the speaker, key actions such as writing or erasing potentially indicate important features directly related to the lecture content. In this paper, we present a methodology for lecture video content extraction using the speaker actions. Each lecture video is divided into small temporal units called action segments. Using a pose estimator, body and hands skeleton data are extracted and used to compute motion-based features describing each action segment. Then, the dominant speaker action of each of these segments is classified using Random forests and the motion-based features. With the temporal and spatial range of these actions, we implement an alternative way to draw key-frames of handwritten content from the video. In addition, for our fixed camera videos, we also use the skeleton data to compute a mask of the speaker writing locations for the subtraction of the background noise from the binarized key-frames. Our method has been tested on a publicly available lecture video dataset, and it shows reasonable recall and precision results, with a very good compression ratio which is better than previous methods based on content analysis.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115481638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Quality and Time Assessment of Binarization Algorithms","authors":"R. Lins, R. Bernardino, D. Jesus","doi":"10.1109/ICDAR.2019.00232","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00232","url":null,"abstract":"Binarization algorithms are an important step in most document analysis and recognition applications. Many aspects of the document affect the performance of binarization algorithms, such as paper texture and color, noises such as the back-to-front interference, stains, and even the type and color of the ink. This work focuses on determining how each document characteristic impacts the time to process and the quality of the binarized image. This paper assesses thirty of the most widely used document binarization algorithms.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"263 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115665617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DICE: Deep Intelligent Contextual Embedding for Twitter Sentiment Analysis","authors":"Usman Naseem, Katarzyna Musial","doi":"10.1109/ICDAR.2019.00157","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00157","url":null,"abstract":"The sentiment analysis of the social media-based short text (e.g., Twitter messages) is very valuable for many good reasons, explored increasingly in different communities such as text analysis, social media analysis, and recommendation. However, it is challenging as tweet-like social media text is often short, informal and noisy, and involves language ambiguity such as polysemy. The existing sentiment analysis approaches are mainly for document and clean textual data. Accordingly, we propose a Deep Intelligent Contextual Embedding (DICE), which enhances the tweet quality by handling noises within contexts, and then integrates four embeddings to involve polysemy in context, semantics, syntax, and sentiment knowledge of words in a tweet. DICE is then fed to a Bi-directional Long Short Term Memory (BiLSTM) network with attention to determine the sentiment of a tweet. The experimental results show that our model outperforms several baselines of both classic classifiers and combinations of various word embedding models in the sentiment analysis of airline-related tweets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114394967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesis of Handwriting Dynamics using Sinusoidal Model","authors":"Himakshi Choudhury, S. Prasanna","doi":"10.1109/ICDAR.2019.00144","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00144","url":null,"abstract":"Handwriting production is a complex mechanism of fine motor control, associated with mainly two degrees of freedom in the horizontal and vertical directions. The relation between the horizontal and vertical velocities depends on the trajectory shape and its length. In this work, we explore the generation of handwriting velocities using two sinusoidal oscillations. The proposed method follows the motor equivalence theory and considers that the patterns are stored in the form of a sequence of corner shapes and its relative location in the letter. These points are referred to as the modulation points, where the parameters of the sinusoidal oscillations are modulated to generate required velocity profiles. Depending on the location and shape of the corners, the amplitude, phase, and frequency relations between the two underlying oscillations are changed. Accordingly, this paper presents an efficient method to synthesize the velocity profiles and hence the handwriting. Further, the shape variability in the synthesized data can also be introduced by modifying the position of the modulation points and its corner shapes. The quality of the synthesized handwriting is evaluated using both subjective and quantitative evaluation methods.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114695295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, A. Dengel, Sheraz Ahmed
{"title":"A Robust Hybrid Approach for Textual Document Classification","authors":"M. Asim, Muhammad Usman Ghani Khan, M. I. Malik, A. Dengel, Sheraz Ahmed","doi":"10.1109/ICDAR.2019.00224","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00224","url":null,"abstract":"Text document classification is an important task for diverse natural language processing based applications. Traditional machine learning approaches mainly focused on reducing dimensionality of textual data to perform classification. This although improved the overall classification accuracy, the classifiers still faced sparsity problem due to lack of better data representation techniques. Deep learning based text document classification, on the other hand, benefitted greatly from the invention of word embeddings that have solved the sparsity problem and researchers focus mainly remained on the development of deep architectures. Deeper architectures, however, learn some redundant features that limit the performance of deep learning based solutions. In this paper, we propose a two stage text document classification methodology which combines traditional feature engineering with automatic feature engineering (using deep learning). The proposed methodology comprises a filter based feature selection (FSE) algorithm followed by a deep convolutional neural network. This methodology is evaluated on the two most commonly used public datasets, i.e., 20 Newsgroups data and BBC news data. Evaluation results reveal that the proposed methodology outperforms the state-of-the-art of both the (traditional) machine learning and deep learning based text document classification methodologies with a significant margin of 7.7% on 20 Newsgroups and 6.6% on BBC news datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116009379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuchen Zheng, W. Ohyama, Brian Kenji Iwana, S. Uchida
{"title":"Capturing Micro Deformations from Pooling Layers for Offline Signature Verification","authors":"Yuchen Zheng, W. Ohyama, Brian Kenji Iwana, S. Uchida","doi":"10.1109/ICDAR.2019.00180","DOIUrl":"https://doi.org/10.1109/ICDAR.2019.00180","url":null,"abstract":"In this paper, we propose a novel Convolutional Neural Network (CNN) based method that extracts the location information (displacement features) of the maximums in the max-pooling operation and fuses it with the pooling features to capture the micro deformations between the genuine signatures and skilled forgeries as a feature extraction procedure. After the feature extraction procedure, we apply support vector machines (SVMs) as writer-dependent classifiers for each user to build the signature verification system. The extensive experimental results on GPDS-150, GPDS-300, GPDS-1000, GPDS-2000, and GPDS-5000 datasets demonstrate that the proposed method can discriminate the genuine signatures and their corresponding skilled forgeries well and achieve state-of-the-art results on these datasets.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123452733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}