OCR Error Correction Using Character Correction and Feature-Based Word Classification

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI:10.1109/DAS.2016.44

Ido Kissos, N. Dershowitz

引用次数: 67

Abstract

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

查看原文本刊更多论文

基于字符校正和特征词分类的OCR纠错

本文探讨了使用学习分类器进行后ocr文本校正。阿拉伯语的实验表明，这种方法集成了加权混淆矩阵和浅语言模型，改善了绝大多数分割和识别错误，这是我们数据集中最常见的错误类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

自引率

0.00%

发文量