阿拉伯手写文本到线分割

Fariza Meziani, Lallouani Bouchakour, Khadidja Ghribi, Mustapha Yahiaoui, H. Latrache, Mourad Abbas
{"title":"阿拉伯手写文本到线分割","authors":"Fariza Meziani, Lallouani Bouchakour, Khadidja Ghribi, Mustapha Yahiaoui, H. Latrache, Mourad Abbas","doi":"10.1109/ICISAT54145.2021.9678458","DOIUrl":null,"url":null,"abstract":"Text to line segmentation is a crucial phase in character recognition system since segmentation errors affects the recognition accuracy. In this work we present a novel and simple method for Arabic handwritten text images segmentation into text-lines. After converting the gray scale images to binary ones, we combine in this proposed method three approaches based on horizontal projection profile (HPP), on connected components (CC) and on skeleton. Firstly, we apply the smoothed horizontal projection profile to detect approximately the beginning and the end of each line. Then, we identify the connected components in each line basing on computing their centroids in order to cluster them to form an individual text-line. Finally, in case there are vertically touching characters, we use the skeleton to separate them by calculating its intersection point. The experiments are performed with 100 text images from the database Khatt. This approach is evaluated by the MatchScore criterion. The obtained results prove the efficiency of our method.","PeriodicalId":112478,"journal":{"name":"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Arabic Handwritten Text to Line Segmentation\",\"authors\":\"Fariza Meziani, Lallouani Bouchakour, Khadidja Ghribi, Mustapha Yahiaoui, H. Latrache, Mourad Abbas\",\"doi\":\"10.1109/ICISAT54145.2021.9678458\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text to line segmentation is a crucial phase in character recognition system since segmentation errors affects the recognition accuracy. In this work we present a novel and simple method for Arabic handwritten text images segmentation into text-lines. After converting the gray scale images to binary ones, we combine in this proposed method three approaches based on horizontal projection profile (HPP), on connected components (CC) and on skeleton. Firstly, we apply the smoothed horizontal projection profile to detect approximately the beginning and the end of each line. Then, we identify the connected components in each line basing on computing their centroids in order to cluster them to form an individual text-line. Finally, in case there are vertically touching characters, we use the skeleton to separate them by calculating its intersection point. The experiments are performed with 100 text images from the database Khatt. This approach is evaluated by the MatchScore criterion. The obtained results prove the efficiency of our method.\",\"PeriodicalId\":112478,\"journal\":{\"name\":\"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICISAT54145.2021.9678458\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISAT54145.2021.9678458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

文本到线条的分割是字符识别系统的关键环节,分割错误会影响识别的准确性。在这项工作中,我们提出了一种新颖而简单的方法将阿拉伯语手写文本图像分割成文本行。在将灰度图像转换为二值图像后,我们将基于水平投影轮廓(HPP)、基于连通分量(CC)和基于骨架的三种方法结合起来。首先,我们应用平滑的水平投影轮廓来近似检测每条线的起点和终点。然后,我们在计算每条线的质心的基础上,识别每条线中连接的组件,以便将它们聚类形成单独的文本行。最后,如果有垂直接触的字符,我们使用骨架通过计算其交点来分离它们。实验使用来自Khatt数据库的100个文本图像进行。这种方法通过MatchScore标准进行评估。所得结果证明了该方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Arabic Handwritten Text to Line Segmentation
Text to line segmentation is a crucial phase in character recognition system since segmentation errors affects the recognition accuracy. In this work we present a novel and simple method for Arabic handwritten text images segmentation into text-lines. After converting the gray scale images to binary ones, we combine in this proposed method three approaches based on horizontal projection profile (HPP), on connected components (CC) and on skeleton. Firstly, we apply the smoothed horizontal projection profile to detect approximately the beginning and the end of each line. Then, we identify the connected components in each line basing on computing their centroids in order to cluster them to form an individual text-line. Finally, in case there are vertically touching characters, we use the skeleton to separate them by calculating its intersection point. The experiments are performed with 100 text images from the database Khatt. This approach is evaluated by the MatchScore criterion. The obtained results prove the efficiency of our method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信