OCR Performance Prediction Using a Bag of Allographs and Support Vector Regression

2014 11th IAPR International Workshop on Document Analysis Systems Pub Date : 2014-04-07 DOI:10.1109/DAS.2014.72

T. Bhowmik, T. Paquet, N. Ragot

引用次数: 6

Abstract

In this paper, we describe a novel and simple technique for prediction of OCR results without using any OCR. The technique uses a bag of allographs to characterize textual components. Then a support vector regression (SVR) technique is used to build a predictor based on the bag of allographs. The performance of the system is evaluated on a corpus of historical documents. The proposed technique produces correct prediction of OCR results on training and test documents within the range of standard deviation of 4.18% and 6.54% respectively. The proposed system has been designed as a tool to assist selection of corpora in libraries and specify the typical performance that can be expected on the selection.

查看原文本刊更多论文

使用异位图和支持向量回归的OCR性能预测

在本文中，我们描述了一种新颖而简单的技术来预测OCR结果，而不使用任何OCR。该技术使用一组同种异体来表征文本成分。然后利用支持向量回归(SVR)技术建立基于同种异体词包的预测器。系统的性能在历史文档的语料库上进行了评估。该方法对训练文件和测试文件OCR结果的预测准确率分别在4.18%和6.54%的标准差范围内。该系统被设计为辅助图书馆语料库选择的工具，并指定在选择中可以预期的典型性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 11th IAPR International Workshop on Document Analysis Systems

自引率

0.00%

发文量