文档增强和识别联合优化的端到端可训练框架

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00019

Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini

{"title":"文档增强和识别联合优化的端到端可训练框架","authors":"Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini","doi":"10.1109/ICDAR.2019.00019","DOIUrl":null,"url":null,"abstract":"Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"An End-to-End Trainable Framework for Joint Optimization of Document Enhancement and Recognition\",\"authors\":\"Anupama Ray, Manoj Sharma, Avinash Upadhyay, Megh Makwana, S. Chaudhury, Akkshita Trivedi, Ajay Pratap Singh, Anil K. Saini\",\"doi\":\"10.1109/ICDAR.2019.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"20 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

从降级和低分辨率文档图像中识别文本仍然是视觉界的一个开放挑战。现有的文本识别系统需要一定的分辨率，如果文档是低分辨率或严重退化或噪声失败。提出了一种基于端到端可训练深度学习的文档增强和识别联合优化框架。我们使用基于生成对抗网络(GAN)的框架来执行图像去噪，然后使用深度反向投影网络(DBPN)进行超分辨率处理，并使用这些超分辨率特征来训练双向长短期记忆(BLSTM)和连接时间分类(CTC)来识别文本序列。整个网络是端到端可训练的，我们在图像增强和文档识别任务中获得了比最先进的结果。我们展示了打印和手写退化文档数据集的结果，以展示我们提出的鲁棒框架的泛化能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An End-to-End Trainable Framework for Joint Optimization of Document Enhancement and Recognition

Recognizing text from degraded and low-resolution document images is still an open challenge in the vision community. Existing text recognition systems require a certain resolution and fails if the document is of low-resolution or heavily degraded or noisy. This paper presents an end-to-end trainable deep-learning based framework for joint optimization of document enhancement and recognition. We are using a generative adversarial network (GAN) based framework to perform image denoising followed by deep back projection network (DBPN) for super-resolution and use these super-resolved features to train a bidirectional long short term memory (BLSTM) with Connectionist Temporal Classification (CTC) for recognition of textual sequences. The entire network is end-to-end trainable and we obtain improved results than state-of-the-art for both the image enhancement and document recognition tasks. We demonstrate results on both printed and handwritten degraded document datasets to show the generalization capability of our proposed robust framework.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量