Improvements in Handwritten and Printed Text Separation in Historical Archival Documents

Archiving : final program and proceedings. IS & T's Archiving Conference Pub Date : 2023-06-19 DOI:10.2352/issn.2168-3204.2023.20.1.7

Mahsa Vafaie, J. Waitelonis, H. Sack

引用次数: 0

Abstract

The presence of handwritten text and annotations combined with typewritten and machine-printed text in historical archival records make them visually complex, posing challenges for OCR systems in accurately transcribing their content. This paper is an extension of [1], reporting on improvements in the separation of handwritten text from machine-printed text (including typewriters), by the use of FCN-based models trained on datasets created from different data synthesis pipelines. Results show a significant increase of about 20% in the intrinsic evaluation on artificial test sets

查看原文本刊更多论文

历史档案文献手写体与印刷体文本分离的改进

历史档案记录中手写文本和注释与打字和机器打印文本相结合，使其在视觉上变得复杂，这对OCR系统准确转录其内容提出了挑战。本文是[1]的扩展，报告了通过使用基于FCN的模型，在不同数据合成管道创建的数据集上训练，改进了手写文本与机器打印文本（包括打字机）的分离。结果显示，人工测试集的内在评估显著提高了约20%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Archiving : final program and proceedings. IS & T's Archiving Conference

自引率

0.00%

发文量