A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection

2012 International Conference on Frontiers in Handwriting Recognition Pub Date : 2012-09-18 DOI:10.1109/ICFHR.2012.151

Jon Almazán, D. F. Mota, A. Fornés, J. Lladós, Ernest Valveny

{"title":"A Coarse-to-Fine Approach for Handwritten Word Spotting in Large Scale Historical Documents Collection","authors":"Jon Almazán, D. F. Mota, A. Fornés, J. Lladós, Ernest Valveny","doi":"10.1109/ICFHR.2012.151","DOIUrl":null,"url":null,"abstract":"In this paper we propose an approach for word spotting in handwritten document images. We state the problem from a focused retrieval perspective, i.e. locating instances of a query word in a large scale dataset of digitized manuscripts. We combine two approaches, namely one based on word segmentation and another one segmentation-free. The first approach uses a hashing strategy to coarsely prune word images that are unlikely to be instances of the query word. This process is fast but has a low precision due to the errors introduced in the segmentation step. The regions containing candidate words are sent to the second process based on a state of the art technique from the visual object detection field. This discriminative model represents the appearance of the query word and computes a similarity score. In this way we propose a coarse-to-fine approach achieving a compromise between efficiency and accuracy. The validation of the model is shown using a collection of old handwritten manuscripts. We appreciate a substantial improvement in terms of precision regarding the previous proposed method with a low computational cost increase.","PeriodicalId":291062,"journal":{"name":"2012 International Conference on Frontiers in Handwriting Recognition","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Frontiers in Handwriting Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFHR.2012.151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 18

Abstract

In this paper we propose an approach for word spotting in handwritten document images. We state the problem from a focused retrieval perspective, i.e. locating instances of a query word in a large scale dataset of digitized manuscripts. We combine two approaches, namely one based on word segmentation and another one segmentation-free. The first approach uses a hashing strategy to coarsely prune word images that are unlikely to be instances of the query word. This process is fast but has a low precision due to the errors introduced in the segmentation step. The regions containing candidate words are sent to the second process based on a state of the art technique from the visual object detection field. This discriminative model represents the appearance of the query word and computes a similarity score. In this way we propose a coarse-to-fine approach achieving a compromise between efficiency and accuracy. The validation of the model is shown using a collection of old handwritten manuscripts. We appreciate a substantial improvement in terms of precision regarding the previous proposed method with a low computational cost increase.

查看原文本刊更多论文

一种从粗到精的大规模历史文献手写体词识别方法

在本文中，我们提出了一种手写文档图像中的单词识别方法。我们从集中检索的角度阐述了这个问题，即在数字化手稿的大规模数据集中定位查询词的实例。我们将两种方法结合起来，一种是基于分词的方法，另一种是无分词的方法。第一种方法使用散列策略对不太可能是查询词实例的单词图像进行粗修剪。该过程速度快，但由于分割步骤中引入的误差，精度较低。基于视觉对象检测领域的最新技术，包含候选词的区域被发送到第二个过程。这个判别模型表示查询词的外观并计算相似度分数。通过这种方式，我们提出了一种从粗到精的方法，在效率和精度之间取得了折衷。使用一组旧的手写手稿来验证该模型。相对于之前提出的计算成本增加较低的方法，我们在精度方面有了实质性的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量