A Novel Arabic Baseline Estimation Algorithm Based on Sub-Words Treatment

2010 12th International Conference on Frontiers in Handwriting Recognition Pub Date : 2010-11-16 DOI:10.1109/ICFHR.2010.58

H. Boukerma, N. Farah

引用次数: 19

Abstract

Baseline detection is an essential preprocessing step for many OCR systems, it has a direct effect on the efficiency and reliability of characters segmentation and features extraction stages, which contribute strongly to yielding higher recognition accuracy. For Arabic handwritten, the conventional methods which extract baseline as straight line are ill-suited because some Arabic words may be contracted from two or more sub-words (PAWs), and the distribution of these sub-words can produce different slant angles within the same word. Focused on the source of the problem, we propose a novel Arabic baseline estimation algorithm in which the PAW level is the real basic block to be processed rather than word level. Experimental results using IFN/ENIT [1] database demonstrate the efficiency of the proposed algorithm.

查看原文本刊更多论文

一种基于子词处理的阿拉伯语基线估计算法

基线检测是许多OCR系统必不可少的预处理步骤，它直接影响到字符分割和特征提取阶段的效率和可靠性，对提高识别精度有重要作用。对于阿拉伯文手写体，由于阿拉伯文的某些词可能由两个或多个子词(PAWs)缩并而成，并且子词的分布会在同一词内产生不同的斜角，因此将基线提取为直线的传统方法并不适用。针对问题的根源，我们提出了一种新的阿拉伯语基线估计算法，该算法将PAW级别而不是单词级别作为真正要处理的基本块。基于IFN/ENIT[1]数据库的实验结果证明了该算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 12th International Conference on Frontiers in Handwriting Recognition

自引率

0.00%

发文量