{"title":"Data Augmentation via Adversarial Networks for Optical Character Recognition/Conference Submissions","authors":"Victor Storchan","doi":"10.1109/ICDAR.2019.00038","DOIUrl":null,"url":null,"abstract":"With the ongoing digitalization of ressources across the industry, robust OCR solutions (Optical Character Recognition) are highly valuable. In this work, we aim at designing models to read typical damaged faxes and PDF files and training them with unlabeled data. State-of-art deep learning architectures require scalable tagged datasets that are often difficult and costly to collect. To ensure compliance standards or to provide reproducible cheap and fast solutions for training OCR systems, producing datasets that mimic the quality of the data that will be passed to the model is paramount. In this paper we discuss using unsupervised image-to-image translation methods to learn transformations that aim to map clean images of words to damaged images of words. The quality of the transformation is evaluated through the OCR brick and these results are compared to the Inception Score (IS) of the GANs we used. That way we are able to generate an arbitrary large realistic dataset without labeling a single observation. As a result, we propose an end-to-end OCR training solution to provide competitive models.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
With the ongoing digitalization of ressources across the industry, robust OCR solutions (Optical Character Recognition) are highly valuable. In this work, we aim at designing models to read typical damaged faxes and PDF files and training them with unlabeled data. State-of-art deep learning architectures require scalable tagged datasets that are often difficult and costly to collect. To ensure compliance standards or to provide reproducible cheap and fast solutions for training OCR systems, producing datasets that mimic the quality of the data that will be passed to the model is paramount. In this paper we discuss using unsupervised image-to-image translation methods to learn transformations that aim to map clean images of words to damaged images of words. The quality of the transformation is evaluated through the OCR brick and these results are compared to the Inception Score (IS) of the GANs we used. That way we are able to generate an arbitrary large realistic dataset without labeling a single observation. As a result, we propose an end-to-end OCR training solution to provide competitive models.