A Full-Text Search System for Images of Hand-Written Cursive Documents

2010 12th International Conference on Frontiers in Handwriting Recognition Pub Date : 2010-11-16 DOI:10.1109/ICFHR.2010.105

Hajime Imura, Yuzuru Tanaka

引用次数: 3

Abstract

We propose a full-text search technique for image-scanned documents that does not recognize individual characters. The system is as fast as a full-text search of machine-readable documents. Such a system is important when working with historical handwritten manuscripts. The proposed method works independently of differences in language and font because it uses a new pseudo-coding scheme based on the statistical features of character shapes. We evaluated our method in recall-precision curves for n-gram-based query strings in Japanese manuscripts and word-based query strings in English manuscripts using two types of image features and two different pseudo-coding schemes. Results demonstrate that the precision reached over 50\% at a recall point of 80\% for 3-gram queries in the Japanese manuscripts. Results also indicate that our pseudo-code is suitable for applications that use machine-learning techniques. The combination of an HMM-based filtering method and our pseudo-code can significantly improve performance in terms of retrieval precision.

查看原文本刊更多论文

手写草书图像全文检索系统

我们提出了一种不识别单个字符的图像扫描文档全文搜索技术。该系统与机器可读文档的全文搜索一样快。这样的系统在处理历史手抄本时很重要。该方法使用了一种基于字符形状统计特征的伪编码方案，不受语言和字体差异的影响。我们使用两种类型的图像特征和两种不同的伪编码方案，在基于n-gram的日语手稿查询字符串和基于单词的英语手稿查询字符串的召回精度曲线上评估了我们的方法。结果表明，对于日语手稿中的3克查询，在召回点为80%的情况下，准确率达到50%以上。结果还表明我们的伪代码适用于使用机器学习技术的应用程序。将基于hmm的过滤方法与伪代码相结合，可以显著提高检索精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 12th International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量