Comparison of OCR Accuracy on Early Printed Books using the Open Source Engines Calamari and OCRopus

J. Lang. Technol. Comput. Linguistics Pub Date : 2018-07-01 DOI:10.21248/jlcl.33.2018.219

C. Wick, Christian Reul, F. Puppe

{"title":"Comparison of OCR Accuracy on Early Printed Books using the Open Source Engines Calamari and OCRopus","authors":"C. Wick, Christian Reul, F. Puppe","doi":"10.21248/jlcl.33.2018.219","DOIUrl":null,"url":null,"abstract":"This paper proposes a combination of a convolutional and an LSTM network to improve the accuracy of OCR on early printed books. While the default approach of line based OCR is to use a single LSTM layer as provided by the well-established OCR software OCRopus (OCRopy), we utilize a CNN-and Pooling-Layer combination in advance of an LSTM layer as implemented by the novel OCR software Calamari. Since historical prints often require book speci ﬁ c models trained on manually labeled ground truth (GT) the goal is to maximize the recognition accuracy of a trained model while keeping the needed manual e ﬀ ort to a minimum. We show, that the deep model signi ﬁ cantly outperforms the shallow LSTM network when using both many and only a few training examples, although the deep network has a higher amount of trainable parameters. Hereby, the error rate is reduced by a factor of up to 55%, yielding character error rates (CER) of 1% and below for 1,000 lines of training. To further improve the results, we apply a con ﬁ dence voting mechanism to achieve CERs below 0 . 5%. A simple data augmentation scheme and the usage of pretrained models reduces the CER further by up to 62% if only few training data is available. Thus, we require only 100 lines of GT to reach an average CER of 1.2%. The runtime of the deep model for training and prediction of a book behaves very similar to a shallow network when trained on a CPU. However, the usage of a GPU, as supported by Calamari, reduces the prediction time by a factor of at least four and the training time by more than six.","PeriodicalId":402489,"journal":{"name":"J. Lang. Technol. Comput. Linguistics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Lang. Technol. Comput. Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21248/jlcl.33.2018.219","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

This paper proposes a combination of a convolutional and an LSTM network to improve the accuracy of OCR on early printed books. While the default approach of line based OCR is to use a single LSTM layer as provided by the well-established OCR software OCRopus (OCRopy), we utilize a CNN-and Pooling-Layer combination in advance of an LSTM layer as implemented by the novel OCR software Calamari. Since historical prints often require book speci ﬁ c models trained on manually labeled ground truth (GT) the goal is to maximize the recognition accuracy of a trained model while keeping the needed manual e ﬀ ort to a minimum. We show, that the deep model signi ﬁ cantly outperforms the shallow LSTM network when using both many and only a few training examples, although the deep network has a higher amount of trainable parameters. Hereby, the error rate is reduced by a factor of up to 55%, yielding character error rates (CER) of 1% and below for 1,000 lines of training. To further improve the results, we apply a con ﬁ dence voting mechanism to achieve CERs below 0 . 5%. A simple data augmentation scheme and the usage of pretrained models reduces the CER further by up to 62% if only few training data is available. Thus, we require only 100 lines of GT to reach an average CER of 1.2%. The runtime of the deep model for training and prediction of a book behaves very similar to a shallow network when trained on a CPU. However, the usage of a GPU, as supported by Calamari, reduces the prediction time by a factor of at least four and the training time by more than six.

查看原文本刊更多论文

使用开源引擎Calamari和OCRopus对早期印刷书籍OCR准确率的比较

本文提出了一种卷积网络和LSTM网络相结合的方法来提高早期印刷书籍OCR的准确率。虽然基于线的OCR的默认方法是使用由成熟的OCR软件OCRopus (OCRopy)提供的单个LSTM层，但我们在使用新颖的OCR软件Calamari实现的LSTM层之前使用cnn和池层组合。由于历史印刷品通常需要在人工标记的ground truth (GT)上训练特定于书籍的模型，因此目标是最大化训练模型的识别准确性，同时将所需的人工工作量降至最低。我们表明，尽管深度网络具有更多的可训练参数，但当使用大量或仅使用少量训练样本时，深度模型明显优于浅层LSTM网络。因此，错误率降低了55%，对于1000行训练，字符错误率(CER)为1%或以下。为了进一步改善结果，我们应用了信任投票机制来实现cer低于0。5%。如果只有很少的训练数据可用，一个简单的数据增强方案和预训练模型的使用可以进一步降低高达62%的CER。因此，我们只需要100行GT就可以达到1.2%的平均CER。用于训练和预测一本书的深度模型的运行时间与在CPU上训练的浅网络非常相似。然而，使用由Calamari支持的GPU，将预测时间减少了至少四倍，训练时间减少了六倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

J. Lang. Technol. Comput. Linguistics

自引率

0.00%

发文量