基于轮廓分析的阿拉伯文手写体文本分割算法

Yusra Osman
{"title":"基于轮廓分析的阿拉伯文手写体文本分割算法","authors":"Yusra Osman","doi":"10.1109/ICCEEE.2013.6633980","DOIUrl":null,"url":null,"abstract":"Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.","PeriodicalId":256793,"journal":{"name":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Segmentation algorithm for Arabic handwritten text based on contour analysis\",\"authors\":\"Yusra Osman\",\"doi\":\"10.1109/ICCEEE.2013.6633980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.\",\"PeriodicalId\":256793,\"journal\":{\"name\":\"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEEE.2013.6633980\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEEE.2013.6633980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

分割是将二值图像按照一定条件分割成有用区域的过程。它是光学字符识别(OCR)系统中最重要的一个阶段,其准确性直接影响到系统的识别率。在阿拉伯语等草书性质的语言中,由于写作者的风格不同以及字符重叠和结扎的特殊情况,切分过程非常复杂,特别是在手写文档中。因此,分割算法的设计必须基于大多数编写者遵循的通用描述符。本文提出了一种针对阿拉伯文笔迹的分割算法。该算法的主要思想是将选定的图像划分为线和子词。然后,对于每个子词,跟踪每个子词的轮廓。然后,算法检测轮廓从水平线状态转变为另一种垂直或曲线状态的精确点。最后,将这些点的坐标作为分割点。该算法在IFN/ENIT数据库单词上进行了测试。超过537个测试单词,包含3222个字符;该算法的字符分割正确率达到89.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Segmentation algorithm for Arabic handwritten text based on contour analysis
Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信