Handwritten and Machine-Printed Text Discrimination Using a Template Matching Approach

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-06-13 DOI:10.1109/DAS.2016.22

Mehryar Emambakhsh, Yulan He, I. Nabney

引用次数: 5

Abstract

We propose a novel template matching approach for the discrimination of handwritten and machine-printed text. We first pre-process the scanned document images by performing denoising, circles/lines exclusion and word-block level segmentation. We then align and match characters in a flexible sized gallery with the segmented regions, using parallelised normalised cross-correlation. The experimental results over the Pattern Recognition & Image Analysis Research Lab-Natural History Museum (PRImA-NHM) dataset show remarkably high robustness of the algorithm in classifying cluttered, occluded and noisy samples, in addition to those with significant high missing data. The algorithm, which gives 84.0% classification rate with false positive rate 0.16 over the dataset, does not require training samples and generates compelling results as opposed to the training-based approaches, which have used the same benchmark.

查看原文本刊更多论文

基于模板匹配方法的手写体和机器打印文本识别

我们提出了一种新的模板匹配方法来区分手写体和机器打印文本。我们首先对扫描的文档图像进行预处理，进行去噪、圈/线排除和词块级分割。然后，我们在一个灵活大小的画廊中与分割的区域对齐和匹配字符，使用并行规范化的相互关联。在模式识别与图像分析研究实验室-自然历史博物馆(PRImA-NHM)数据集上的实验结果表明，除了数据缺失率较高的样本外，该算法在分类混乱、遮挡和噪声样本方面具有非常高的鲁棒性。该算法在数据集上给出了84.0%的分类率和0.16的误报率，不需要训练样本，与使用相同基准的基于训练的方法相反，它产生了令人信服的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

自引率

0.00%

发文量