如何有效提高神经OCR模型的分辨率

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR) Pub Date : 2018-03-01 DOI:10.1109/ASAR.2018.8480182

Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan

{"title":"如何有效提高神经OCR模型的分辨率","authors":"Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan","doi":"10.1109/ASAR.2018.8480182","DOIUrl":null,"url":null,"abstract":"Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the \"base\" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.","PeriodicalId":165564,"journal":{"name":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"How To Efficiently Increase Resolution in Neural OCR Models\",\"authors\":\"Stephen Rawls, Huaigu Cao, Joe Mathai, P. Natarajan\",\"doi\":\"10.1109/ASAR.2018.8480182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the \\\"base\\\" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.\",\"PeriodicalId\":165564,\"journal\":{\"name\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAR.2018.8480182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAR.2018.8480182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

现代CRNN OCR模型需要所有图像的固定线高度，并且已知，在一定程度上，增加输入分辨率可以提高识别性能。然而，通过简单地增加输入图像的行高而不改变CRNN架构来做到这一点在内存和计算方面有很大的成本(它们都是输入行高的w.r.t. O(n2))。我们在CRNN模型中引入了一些非常小的卷积层和最大池化层，以便在传递到“基础”CRNN模型之前将高分辨率图像快速下采样到更易于管理的分辨率。这样做可以极大地提高识别性能，而计算和内存需求的增加非常适度。在Open-HART/MADCAT阿拉伯手写数据上，当将输入分辨率从30px线高提高到240px线高时，我们显示了33%的相对改进，从8.8%提高到5.9%。这是对阿拉伯笔迹的最新研究成果，从已经很强的基线上的巨大改进显示了这种技术的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

How To Efficiently Increase Resolution in Neural OCR Models

Modern CRNN OCR models require a fixed line height for all images, and it is known that, up to a point, increasing this input resolution improves recognition performance. However, doing so by simply increasing the line height of input images without changing the CRNN architecture has a large cost in memory and computation (they both scale O(n2) w.r.t. the input line height).We introduce a few very small convolutional and max pooling layers to a CRNN model to rapidly downsample high resolution images to a more manageable resolution before passing off to the "base" CRNN model. Doing this greatly improves recognition performance with a very modest increase in computation and memory requirements. We show a 33% relative improvement in WER, from 8.8% to 5.9% when increasing the input resolution from 30px line height to 240px line height on Open-HART/MADCAT Arabic handwriting data.This is a new state of the art result on Arabic handwriting, and the large improvement from an already strong baseline shows the impact of this technique.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE 2nd International Workshop on Arabic and Derived Script Analysis and Recognition (ASAR)

自引率

0.00%

发文量