阿拉伯语手写文本行提取

Proceedings of Sixth International Conference on Document Analysis and Recognition Pub Date : 2001-09-10 DOI:10.1109/ICDAR.2001.953799

Abderrazak Zahour, B. Taconet, P. Mercy, Said Ramdane

{"title":"阿拉伯语手写文本行提取","authors":"Abderrazak Zahour, B. Taconet, P. Mercy, Said Ramdane","doi":"10.1109/ICDAR.2001.953799","DOIUrl":null,"url":null,"abstract":"This paper describes a text-line extraction based method. The typical segmentation for a printed binary document is based on the horizontal projection analysis and then the regrouping of the connected components. These techniques can't be used for handwritten unconstrained text because data frequently contain undulations and shifts in the baseline, baseline-skew variability and inter-line distance variability. So, we think that the border line for a handwritten unconstrained documents should be a collection of horizontal line segments. From this point of view, we use a partial contour following based method to detect the separating lines. In the current version of our algorithm, we proceed to text slant detection, text line number evaluation by using partial projection. Then we carry out a partial contour following of every line; first in the direction of the writing, then in the opposite direction. After the treatment, the adjacent lines are separated. In the experimental session, we describe the application of the algorithm used for the extraction of text line. Database images contains about one hundred handwritten Arabic texts written by different writers. Results about diacritical points affectation are also reported.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"127","resultStr":"{\"title\":\"Arabic hand-written text-line extraction\",\"authors\":\"Abderrazak Zahour, B. Taconet, P. Mercy, Said Ramdane\",\"doi\":\"10.1109/ICDAR.2001.953799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a text-line extraction based method. The typical segmentation for a printed binary document is based on the horizontal projection analysis and then the regrouping of the connected components. These techniques can't be used for handwritten unconstrained text because data frequently contain undulations and shifts in the baseline, baseline-skew variability and inter-line distance variability. So, we think that the border line for a handwritten unconstrained documents should be a collection of horizontal line segments. From this point of view, we use a partial contour following based method to detect the separating lines. In the current version of our algorithm, we proceed to text slant detection, text line number evaluation by using partial projection. Then we carry out a partial contour following of every line; first in the direction of the writing, then in the opposite direction. After the treatment, the adjacent lines are separated. In the experimental session, we describe the application of the algorithm used for the extraction of text line. Database images contains about one hundred handwritten Arabic texts written by different writers. Results about diacritical points affectation are also reported.\",\"PeriodicalId\":277816,\"journal\":{\"name\":\"Proceedings of Sixth International Conference on Document Analysis and Recognition\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"127\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of Sixth International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2001.953799\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2001.953799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 127

摘要

本文描述了一种基于文本行提取的方法。典型的打印二进制文档分割方法是基于水平投影分析，然后将连接的组件重新分组。这些技术不能用于手写的无约束文本，因为数据经常包含基线的波动和移动、基线倾斜可变性和线间距离可变性。因此，我们认为手写的无约束文档的边界线应该是水平线段的集合。从这个角度来看，我们使用基于部分轮廓跟踪的方法来检测分离线。在当前版本的算法中，我们继续使用部分投影进行文本倾斜检测，文本行数评估。然后对每条线进行局部轮廓跟踪;首先顺着书写的方向，然后顺着相反的方向。处理后，相邻的线被分开。在实验部分，我们描述了该算法在文本行提取中的应用。数据库图像包含大约100个由不同作者手写的阿拉伯文本。本文还报道了变音符点影响的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Arabic hand-written text-line extraction

This paper describes a text-line extraction based method. The typical segmentation for a printed binary document is based on the horizontal projection analysis and then the regrouping of the connected components. These techniques can't be used for handwritten unconstrained text because data frequently contain undulations and shifts in the baseline, baseline-skew variability and inter-line distance variability. So, we think that the border line for a handwritten unconstrained documents should be a collection of horizontal line segments. From this point of view, we use a partial contour following based method to detect the separating lines. In the current version of our algorithm, we proceed to text slant detection, text line number evaluation by using partial projection. Then we carry out a partial contour following of every line; first in the direction of the writing, then in the opposite direction. After the treatment, the adjacent lines are separated. In the experimental session, we describe the application of the algorithm used for the extraction of text line. Database images contains about one hundred handwritten Arabic texts written by different writers. Results about diacritical points affectation are also reported.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of Sixth International Conference on Document Analysis and Recognition

自引率

0.00%

发文量