Preserving Text Content from Historical Handwritten Documents

2016 12th IAPR Workshop on Document Analysis Systems (DAS) Pub Date : 2016-04-11 DOI:10.1109/DAS.2016.77

Arpita Chakraborty, M. Blumenstein

引用次数: 5

Abstract

We propose a holistic, dynamic method to preserve text content with zero tolerance while removing marginal noise for historical handwritten document images. The key idea is to identify and analyze the region between the sharp peak at the edge and page frame of the text content at each margin. Depending on the proximity of the sharp peak to the text, the text content is then extracted from the document image. This method automatically adapts thresholds for each single document image and is directly applicable to gray-scale images. The proposed method is evaluated on four diverse handwritten historical datasets: Queensland State Archive (QSA), Saint Gall, Parzival and the Prosecution Project. Experimental results show that the proposed method achieves higher accuracy compared with other methods tested on the Saint Gall and Parzival datasets, whilst for the other two Australian datasets, which have been introduced here for the first time, the results are very encouraging.

查看原文本刊更多论文

保存历史手写文件中的文本内容

我们提出了一种整体的、动态的方法来零容忍地保留文本内容，同时去除历史手写文档图像的边缘噪声。关键思想是识别和分析文本内容在每个页边距的边缘尖峰和页面框架之间的区域。根据尖锐峰值与文本的接近程度，然后从文档图像中提取文本内容。该方法可自动调整单个文档图像的阈值，并直接适用于灰度图像。所提出的方法在四个不同的手写历史数据集上进行了评估:昆士兰州立档案馆(QSA)、Saint Gall、Parzival和起诉项目。实验结果表明，本文提出的方法在Saint Gall和Parzival数据集上取得了较高的精度，而在本文首次介绍的另外两个澳大利亚数据集上取得了令人鼓舞的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 12th IAPR Workshop on Document Analysis Systems (DAS)

自引率

0.00%

发文量