Handwritten Text Line Identification in Indian Scripts

2009 10th International Conference on Document Analysis and Recognition Pub Date : 2009-07-26 DOI:10.1109/ICDAR.2009.69

B. Chaudhuri, Sumedha Bera

引用次数: 31

Abstract

Preprocessing in handwritten text OCR involves line, word and character segmentation. This paper deals with text line identification of handwritten Indian scripts, especially of Bangla, as well as English, Hindi, Malayalam, etc. Here, a new dual method based on interdependency between text-line and inter-line gap is proposed. The method draws curves simultaneously through the text and inter-line gap points found from strip-wise histogram peaks and inter-peak valleys. The curves start from left and move right while one type of points guides the curve of other type so that the curves do not intersect. Then these curves are allowed to iteratively evolve so that the text-line curves cross more character strokes while inter-line curves cross less character strokes and yet keep the curves as straight as possible. After several iterations, the curves stabilize and define the final text-lines and inter-line gaps. The approach works well on text of different scripts with various geometric layouts, including poetry.

查看原文本刊更多论文

印度手写体文本行识别

手写文本OCR的预处理包括行、词和字符分割。本文研究了手写体印度文字的文本行识别，特别是孟加拉语，以及英语、印地语、马拉雅拉姆语等。在此基础上，提出了一种基于文本行间距和行间距相互依赖关系的双重识别方法。该方法通过文本同时绘制曲线和从逐条直方图峰和峰间谷中找到的行间间隙点。曲线从左开始向右移动，而一种类型的点引导另一种类型的曲线，使曲线不相交。然后允许这些曲线迭代发展，以便文本行曲线跨越更多的字符笔画，而行间曲线跨越更少的字符笔画，但保持曲线尽可能直。经过几次迭代，曲线稳定并定义了最终的文本线和行间间隙。这种方法适用于各种几何布局的文本，包括诗歌。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 10th International Conference on Document Analysis and Recognition

自引率

0.00%

发文量