An indexed full-text search method of printed document images with an M-tree

RIAO Conference Pub Date : 2010-04-28 DOI:10.5555/1937055.1937071

Hajime Imura, Yuzuru Tanaka

引用次数: 2

Abstract

This paper describes an indexed full-text search method of printed document images for the occurrences of a specified character string image. It is based on N-gram-based indexing with an M-tree index structure. It is important to facilitate a full-text search method of historical letterpress printing collections to be able to deal with them. The proposed full-text search method is independent of difference of languages and fonts because it uses a pseudo-coding scheme that is based on the statistical features of character shapes. Conventional Word Spotting methods need a sequential scan of the whole document image and a matching calculation of the whole descriptor sequence of a document. The proposed N-gram-based indexing method accelerates the search process with an M-tree. Our method was evaluated in terms of its search time and of recall-precision curve for N-gram-based query strings. Our experiments demonstrated that the proposed approach achieves search times that are one hundred times faster improvement about search time.

查看原文本刊更多论文

一个带m树的打印文档图像的索引全文搜索方法

本文描述了一种打印文档图像的索引全文搜索方法，用于查找指定字符串图像的出现情况。它基于n -gram索引，具有M-tree索引结构。促进历史凸版印刷馆藏的全文检索方法的建立，对处理这些问题具有重要意义。全文检索方法采用基于字符形状统计特征的伪编码方案，不受语言和字体差异的影响。传统的单词定位方法需要对整个文档图像进行顺序扫描，并对文档的整个描述符序列进行匹配计算。提出了一种基于n - grams的索引方法，利用m树加速了搜索过程。我们的方法在基于n -gram的查询字符串的搜索时间和召回精度曲线方面进行了评估。我们的实验表明，所提出的方法实现的搜索时间比搜索时间快100倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

RIAO Conference

自引率

0.00%

发文量