基于方向特征的阿拉伯字符分割高级笔画标注技术

Tarik Abu-Ain, S. N. H. S. Abdullah, K. Omar, Siti Zaharah Abd. Rahman
{"title":"基于方向特征的阿拉伯字符分割高级笔画标注技术","authors":"Tarik Abu-Ain, S. N. H. S. Abdullah, K. Omar, Siti Zaharah Abd. Rahman","doi":"10.17576/APJITM-2019-0801-08","DOIUrl":null,"url":null,"abstract":"Offline Character segmentation of text images is an important step in many document image analysis and recognition (DIAR) applications. However, the character segmentation of both writing styles (printed and handwritten) remains an open problem. Moreover, the manual segmentation is time-consuming and impractical for large numbers of documents. Based on the unconstraint-cursive handwritten perspective, the auto character segmentation is more challenging and complex. The Arabic script writing style suffers from many common problems, such as sub-words overlapping, characters overlapping, and missed characters. These challenging issues have attracted the attention of researchers in the field of DIAR for Arabic character segmentation. The proposed method combines a new advanced Stroke Labelling based on Direction Features (SLDF2) technique and a modified vertical projection histogram (MVPH) technique. This technique extracts the relationship between each text stroke pixel and its 8 neighboring foreground pixels and labels it with the proper value before identify the possible segmentation points. The text preparation for the segmentation process was achieved using multiple preprocessing steps and developing an advanced stroke labelling technique based on direction features. Several Arabic language structural-rules were made to detect the candidate segmentation points (CSP), detect many character overlapping cases, solve the missed characters problem that appears as a result of using the text skeleton in VPH, and validate the CSP. All techniques and methods are tested on the ACDAR benchmark database. The validation method used to measure segmentation accuracy was a quantitative analysis that includes Recall, Precision, and F-measurement tests. The average accuracy of the proposed segmentation method was 92.44%, which outperforms the state-of-the-art method.","PeriodicalId":130300,"journal":{"name":"Asia-Pacific Journal of Information Technology & Multimedia","volume":"273 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Advanced Stroke Labeling Technique Based on Directions Features for Arabic Character Segmentation\",\"authors\":\"Tarik Abu-Ain, S. N. H. S. Abdullah, K. Omar, Siti Zaharah Abd. Rahman\",\"doi\":\"10.17576/APJITM-2019-0801-08\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Offline Character segmentation of text images is an important step in many document image analysis and recognition (DIAR) applications. However, the character segmentation of both writing styles (printed and handwritten) remains an open problem. Moreover, the manual segmentation is time-consuming and impractical for large numbers of documents. Based on the unconstraint-cursive handwritten perspective, the auto character segmentation is more challenging and complex. The Arabic script writing style suffers from many common problems, such as sub-words overlapping, characters overlapping, and missed characters. These challenging issues have attracted the attention of researchers in the field of DIAR for Arabic character segmentation. The proposed method combines a new advanced Stroke Labelling based on Direction Features (SLDF2) technique and a modified vertical projection histogram (MVPH) technique. This technique extracts the relationship between each text stroke pixel and its 8 neighboring foreground pixels and labels it with the proper value before identify the possible segmentation points. The text preparation for the segmentation process was achieved using multiple preprocessing steps and developing an advanced stroke labelling technique based on direction features. Several Arabic language structural-rules were made to detect the candidate segmentation points (CSP), detect many character overlapping cases, solve the missed characters problem that appears as a result of using the text skeleton in VPH, and validate the CSP. All techniques and methods are tested on the ACDAR benchmark database. The validation method used to measure segmentation accuracy was a quantitative analysis that includes Recall, Precision, and F-measurement tests. The average accuracy of the proposed segmentation method was 92.44%, which outperforms the state-of-the-art method.\",\"PeriodicalId\":130300,\"journal\":{\"name\":\"Asia-Pacific Journal of Information Technology & Multimedia\",\"volume\":\"273 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asia-Pacific Journal of Information Technology & Multimedia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17576/APJITM-2019-0801-08\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asia-Pacific Journal of Information Technology & Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17576/APJITM-2019-0801-08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

文本图像的离线字符分割是许多文档图像分析与识别(DIAR)应用中的重要步骤。然而,两种书写风格(印刷和手写)的字符分割仍然是一个悬而未决的问题。此外,对于大量的文档,人工分割是费时且不切实际的。基于无约束草书手写视角的字符自动分割具有较大的挑战性和复杂性。阿拉伯文的书写风格存在着许多常见的问题,如子词重叠、字符重叠、字符遗漏等。这些具有挑战性的问题引起了阿拉伯语字符分割DIAR领域研究人员的关注。该方法结合了一种新的基于方向特征的脑卒中标记(SLDF2)技术和一种改进的垂直投影直方图(MVPH)技术。该技术提取每个文本笔画像素与其相邻的8个前景像素之间的关系,并将其标记为合适的值,然后识别可能的分割点。通过多个预处理步骤和基于方向特征的先进笔划标记技术,实现了分割过程的文本准备。利用阿拉伯语结构规则检测候选分割点(CSP),检测多种字符重叠情况,解决了在VPH中使用文本骨架导致的遗漏字符问题,并对CSP进行了验证。所有的技术和方法都在ACDAR基准数据库上进行了测试。用于测量分割准确性的验证方法是定量分析,包括召回率,精度和f测量测试。该分割方法的平均准确率为92.44%,优于现有的分割方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Advanced Stroke Labeling Technique Based on Directions Features for Arabic Character Segmentation
Offline Character segmentation of text images is an important step in many document image analysis and recognition (DIAR) applications. However, the character segmentation of both writing styles (printed and handwritten) remains an open problem. Moreover, the manual segmentation is time-consuming and impractical for large numbers of documents. Based on the unconstraint-cursive handwritten perspective, the auto character segmentation is more challenging and complex. The Arabic script writing style suffers from many common problems, such as sub-words overlapping, characters overlapping, and missed characters. These challenging issues have attracted the attention of researchers in the field of DIAR for Arabic character segmentation. The proposed method combines a new advanced Stroke Labelling based on Direction Features (SLDF2) technique and a modified vertical projection histogram (MVPH) technique. This technique extracts the relationship between each text stroke pixel and its 8 neighboring foreground pixels and labels it with the proper value before identify the possible segmentation points. The text preparation for the segmentation process was achieved using multiple preprocessing steps and developing an advanced stroke labelling technique based on direction features. Several Arabic language structural-rules were made to detect the candidate segmentation points (CSP), detect many character overlapping cases, solve the missed characters problem that appears as a result of using the text skeleton in VPH, and validate the CSP. All techniques and methods are tested on the ACDAR benchmark database. The validation method used to measure segmentation accuracy was a quantitative analysis that includes Recall, Precision, and F-measurement tests. The average accuracy of the proposed segmentation method was 92.44%, which outperforms the state-of-the-art method.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信