{"title":"基于轮廓分析的阿拉伯文手写体文本分割算法","authors":"Yusra Osman","doi":"10.1109/ICCEEE.2013.6633980","DOIUrl":null,"url":null,"abstract":"Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.","PeriodicalId":256793,"journal":{"name":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Segmentation algorithm for Arabic handwritten text based on contour analysis\",\"authors\":\"Yusra Osman\",\"doi\":\"10.1109/ICCEEE.2013.6633980\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.\",\"PeriodicalId\":256793,\"journal\":{\"name\":\"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEEE.2013.6633980\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEEE.2013.6633980","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Segmentation algorithm for Arabic handwritten text based on contour analysis
Segmentation is the process of dividing the binary image into useful regions according to certain conditions. It is the most important phase in any optical character recognition (OCR) system and its accuracy affects significantly the recognition rate of that system. In cursive nature languages such as Arabic, the segmentation procedure is complicated especially in handwritten documents because writers' styles differs as well as the special cases of characters overlapping and ligatures. Hence, the design of the segmentation algorithms must be based on general descriptors that most writers follow. In this paper, a segmentation algorithm for Arabic handwriting has been developed. The main idea of the algorithm is to divide the selected image into lines and sub-words. Then, for each subword, the contour of each sub-word is traced. After that, the algorithm detects the exact points where the contour changes its state from a horizontal line to another state of vertical or curved line. Finally, the coordinates of these points are considered as the segmentation points. The algorithm was tested over the IFN/ENIT database words. Over 537 tested words containing 3222 character; the algorithm achieved 89.4% of correct character segmentation points.