Line segmentation for grayscale text images of khmer palm leaf manuscripts

Dona Valy, M. Verleysen, Kimheng Sok
{"title":"Line segmentation for grayscale text images of khmer palm leaf manuscripts","authors":"Dona Valy, M. Verleysen, Kimheng Sok","doi":"10.1109/IPTA.2017.8310097","DOIUrl":null,"url":null,"abstract":"Text line segmentation is one of the most essential pre-processing steps in character recognition and document analysis. In ancient documents, a variety of deformations caused by aging produce noises which make the binarization process very challenging. Moreover, due to the irregular layout such as skewness and fluctuation of text lines, segmenting an ancient manuscript page into lines still remains an open problem to solve. In this paper, we propose a novel line segmentation scheme for grayscale images of Khmer ancient documents. First, a stroke width transform is applied to extract connected components from the document page. The number and medial positions of text lines are estimated using a modified piece-wise projection profile technique. Those positions are then modified adaptively according to the curvature of the actual text lines. Finally, a path finding approach is used to separate touching components and also to mark the boundary of the text lines. Experiments are conducted on a dataset of 110 pages of Khmer palm leaf manuscript images by comparing the robustness of the proposed approach with existing methods from the literature.","PeriodicalId":316356,"journal":{"name":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPTA.2017.8310097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Text line segmentation is one of the most essential pre-processing steps in character recognition and document analysis. In ancient documents, a variety of deformations caused by aging produce noises which make the binarization process very challenging. Moreover, due to the irregular layout such as skewness and fluctuation of text lines, segmenting an ancient manuscript page into lines still remains an open problem to solve. In this paper, we propose a novel line segmentation scheme for grayscale images of Khmer ancient documents. First, a stroke width transform is applied to extract connected components from the document page. The number and medial positions of text lines are estimated using a modified piece-wise projection profile technique. Those positions are then modified adaptively according to the curvature of the actual text lines. Finally, a path finding approach is used to separate touching components and also to mark the boundary of the text lines. Experiments are conducted on a dataset of 110 pages of Khmer palm leaf manuscript images by comparing the robustness of the proposed approach with existing methods from the literature.
高棉棕榈叶手稿灰度文本图像的直线分割
文本线分割是字符识别和文档分析中最重要的预处理步骤之一。在古代文献中,由于老化引起的各种形变会产生噪声,这给二值化过程带来了很大的挑战。此外,由于文本线条的歪斜、起伏等不规则布局,古代手稿页面的线段分割仍然是一个有待解决的问题。本文提出了一种新的高棉古代文献灰度图像的直线分割方案。首先,应用笔画宽度变换从文档页面中提取连接组件。使用改进的分段投影轮廓技术估计文本线的数量和中间位置。然后根据实际文本行的曲率自适应地修改这些位置。最后,使用寻径方法分离触摸组件并标记文本行边界。实验在110页的高棉棕榈叶手稿图像数据集上进行,通过比较所提出的方法与文献中现有方法的鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信