How To Efficiently Increase Resolution in Neural OCR Models

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-01 DOI:10.1109/ASAR.2018.8480182

Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan

{"title":"How To Efficiently Increase Resolution in Neural OCR Models","authors":"Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan","doi":"10.1109/ASAR.2018.8480182","DOIUrl":null,"url":null,"abstract":"Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the \"base\" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the "base" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.

查看原文本刊更多论文

如何有效提高神经OCR模型的分辨率

现代CRNN OCR模型需要所有图像的固定线高度，并且已知，在一定程度上，增加输入分辨率可以提高识别性能。然而，通过简单地增加输入图像的行高而不改变CRNN架构来做到这一点在内存和计算方面有很大的成本(它们都是输入行高的w.r.t. O(n2))。我们在CRNN模型中引入了一些非常小的卷积层和最大池化层，以便在传递到“基础”CRNN模型之前将高分辨率图像快速下采样到更易于管理的分辨率。这样做可以极大地提高识别性能，而计算和内存需求的增加非常适度。在Open-HART/MADCAT阿拉伯手写数据上，当将输入分辨率从30px线高提高到240px线高时，我们显示了33%的相对改进，从8.8%提高到5.9%。这是对阿拉伯笔迹的最新研究成果，从已经很强的基线上的巨大改进显示了这种技术的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)

自引率

0.00%

发文量