How To Efficiently Increase Resolution in Neural OCR Models

Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan
{"title":"How To Efficiently Increase Resolution in Neural OCR Models","authors":"Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan","doi":"10.1109/ASAR.2018.8480182","DOIUrl":null,"url":null,"abstract":"Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the \"base\" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the "base" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.
如何有效提高神经OCR模型的分辨率
现代CRNN OCR模型需要所有图像的固定线高度,并且已知,在一定程度上,增加输入分辨率可以提高识别性能。然而,通过简单地增加输入图像的行高而不改变CRNN架构来做到这一点在内存和计算方面有很大的成本(它们都是输入行高的w.r.t. O(n2))。我们在CRNN模型中引入了一些非常小的卷积层和最大池化层,以便在传递到“基础”CRNN模型之前将高分辨率图像快速下采样到更易于管理的分辨率。这样做可以极大地提高识别性能,而计算和内存需求的增加非常适度。在Open-HART/MADCAT阿拉伯手写数据上,当将输入分辨率从30px线高提高到240px线高时,我们显示了33%的相对改进,从8.8%提高到5.9%。这是对阿拉伯笔迹的最新研究成果,从已经很强的基线上的巨大改进显示了这种技术的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信