一种基于交错关键字识别和用户验证的加速历史手写文档转录的新方法

Adolfo Santoro, A. Marcelli
{"title":"一种基于交错关键字识别和用户验证的加速历史手写文档转录的新方法","authors":"Adolfo Santoro, A. Marcelli","doi":"10.1109/ICDAR.2019.00198","DOIUrl":null,"url":null,"abstract":"We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.","PeriodicalId":325437,"journal":{"name":"2019 International Conference on Document Analysis and Recognition (ICDAR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation\",\"authors\":\"Adolfo Santoro, A. Marcelli\",\"doi\":\"10.1109/ICDAR.2019.00198\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.\",\"PeriodicalId\":325437,\"journal\":{\"name\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Document Analysis and Recognition (ICDAR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2019.00198\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Document Analysis and Recognition (ICDAR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2019.00198","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

本文提出了一种利用关键词识别系统加快数字历史档案中手写文献内容转录速度的新方法。拟议的方法不是按照惯例在单个步骤中执行系统输出的验证,而是设想将多步骤验证过程嵌入到人在循环方法中。在每一步,验证系统输出,当系统错误地返回与关键字列表中的任何条目不对应的图像词时,将输入其正确的转录并用于下一步查询系统。我们的方法的性能已经通过实验评估,根据从边沁数据集实现文档子集的完整转录的总时间。结果证实,与手动转录和传统的循环末端验证过程相比,系统的关键字识别和用户的验证相结合可以显著减少转录文档内容所需的时间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Novel Procedure to Speed up the Transcription of Historical Handwritten Documents by Interleaving Keyword Spotting and user Validation
We propose a novel procedure to speed-up the content transcription of handwritten documents in digital historical archives when a keyword spotting system is used for the purpose. Instead of performing the validation of the system outputs in a single step, as it is customary, the proposed methodology envisaged a multi-step validation process to be embedded into a human-in-the-loop approach. At each step, the system outputs are validated and, whenever an image word that does not correspond to any entry of the keyword list is mistakenly returned by the system, its correct transcription is entered and used to query the system in the next step. The performance of our approach has been experimentally evaluated in terms of the total time to achieve the complete transcription of a subset of documents from the Bentham dataset. The results confirm that interleaving keyword spotting by the system and validation by the user leads to a significant reduction of the time required to transcribe the document content with respect to both the manual transcription and the traditional end-of-the-loop validation process.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信