一种基于交错关键字识别和用户验证的加速历史手写文档转录的新方法

2019 International Conference on Document Analysis and Recognition (ICDAR) Pub Date : 2019-09-01 DOI:10.1109/ICDAR.2019.00198

Adolfo Santoro, A. Marcelli

{"title":"一种基于交错关键字识别和用户验证的加速历史手写文档转录的新方法","authors":"Adolfo Santoro, A. Marcelli","doi":"10.1109/ICDAR.2019.00198","DOIUrl":null,"url":null,"abstract":"We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation\",\"authors\":\"Adolfo Santoro, A. Marcelli\",\"doi\":\"10.1109/ICDAR.2019.00198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

本文提出了一种利用关键词识别系统加快数字历史档案中手写文献内容转录速度的新方法。拟议的方法不是按照惯例在单个步骤中执行系统输出的验证，而是设想将多步骤验证过程嵌入到人在循环方法中。在每一步，验证系统输出，当系统错误地返回与关键字列表中的任何条目不对应的图像词时，将输入其正确的转录并用于下一步查询系统。我们的方法的性能已经通过实验评估，根据从边沁数据集实现文档子集的完整转录的总时间。结果证实，与手动转录和传统的循环末端验证过程相比，系统的关键字识别和用户的验证相结合可以显著减少转录文档内容所需的时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation

We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 International Conference on Document Analysis and Recognition (ICDAR)

自引率

0.00%

发文量