A survey of handwritten document pre-processing techniques and customizing for Indic script

V. Hole, L. Ragha
{"title":"A survey of handwritten document pre-processing techniques and customizing for Indic script","authors":"V. Hole, L. Ragha","doi":"10.1145/1980022.1980065","DOIUrl":null,"url":null,"abstract":"Preprocessing of document image is a very important step to handle the deformations namely noise, different handwriting complexities that may result in base line skew, word skew, character skew, accents may be cited either above or below the text line and parts of neighboring text lines may be connected, etc. The paper proposes a novel preprocessing technique for handwritten document to handle some of the deformations usually present in the document like touching components, overlapping components, skewed lines, words with individual skews etc. and build a proper text image with all these deformations removed.\n Based on the analysis of Indian script character shapes and literature survey, it proposes a new sequence of preprocessing methods. A binarized image is sub-sampled and connected components are extracted. These components are dilated and thinned and is given to Hough transform for both global skew and local skew detection for line extraction. The word segmentation is done with the computation of the distances of adjacent components in the text line image and classification of the previously computed distances as either inter-word gaps or inter-character gaps. The extracted words can be used for producing properly aligned text image or for text conversion using OCR.","PeriodicalId":197580,"journal":{"name":"International Conference & Workshop on Emerging Trends in Technology","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference & Workshop on Emerging Trends in Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1980022.1980065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Preprocessing of document image is a very important step to handle the deformations namely noise, different handwriting complexities that may result in base line skew, word skew, character skew, accents may be cited either above or below the text line and parts of neighboring text lines may be connected, etc. The paper proposes a novel preprocessing technique for handwritten document to handle some of the deformations usually present in the document like touching components, overlapping components, skewed lines, words with individual skews etc. and build a proper text image with all these deformations removed. Based on the analysis of Indian script character shapes and literature survey, it proposes a new sequence of preprocessing methods. A binarized image is sub-sampled and connected components are extracted. These components are dilated and thinned and is given to Hough transform for both global skew and local skew detection for line extraction. The word segmentation is done with the computation of the distances of adjacent components in the text line image and classification of the previously computed distances as either inter-word gaps or inter-character gaps. The extracted words can be used for producing properly aligned text image or for text conversion using OCR.
手写体文件预处理技术及印度文字定制研究
文档图像的预处理是处理变形的一个非常重要的步骤,即噪声,不同的手写复杂性可能导致基线歪斜,单词歪斜,字符歪斜,重音可能在文本行上方或下方被引用,相邻文本行的部分可能被连接等。本文提出了一种新的手写体文本预处理技术,用于处理手写体文本中经常出现的一些变形,如触摸分量、重叠分量、歪斜线、单字歪斜等,并在去除这些变形的情况下构建合适的文本图像。在对印度文字字形分析和文献综述的基础上,提出了一套新的预处理方法。对二值化后的图像进行子采样,提取连通分量。对这些分量进行扩展和细化,并对其进行霍夫变换,用于全局倾斜检测和局部倾斜检测,用于线提取。分词是通过计算文本行图像中相邻分量的距离并将之前计算的距离分类为词间间隙或字符间间隙来完成的。提取的单词可用于生成正确对齐的文本图像或使用OCR进行文本转换。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信