Skeleton extraction: Comparison of five methods on the Arabic IFN/ENIT database

2014 6th International Conference on Computer Science and Information Technology (CSIT) Pub Date : 2014-03-26 DOI:10.1109/CSIT.2014.6805978

Atallah Al-Shatnawi, K. Omar, Bader M. AlFawwaz, A. Zeki

{"title":"Skeleton extraction: Comparison of five methods on the Arabic IFN/ENIT database","authors":"Atallah Al-Shatnawi, K. Omar, Bader M. AlFawwaz, A. Zeki","doi":"10.1109/CSIT.2014.6805978","DOIUrl":null,"url":null,"abstract":"Thinning “Skeletonization” is a very crucial stage in the Arabic Character Recognition (ACR) system. It simplifies the text shape and reduces the amount of data that needs to be handled and it is usually used as a pre-processing stage for recognition and storage systems. The skeleton of Arabic text can be used for: baseline detection, character segmentation, and features extraction, and ultimately supporting the classification. In this paper, five of the state of the art thinning algorithms are selected and implemented. The five algorithms are: SPTA, Zhang-Suen parallel thinning algorithm, Huang-Wan-Liu thinning algorithm, thinning and skeletonization based morphological operation algorithms. The five selected algorithms are applied on the IFN/ENIT dataset. The results obtained by the five methods are discussed and analyzed against the IFN/ENIT dataset based on preserving shape and the text connectivity, preventing spurious tails, maintaining one pixel width skeleton and avoiding the necking problem as well as running time efficiently. In addition to that some performance measurement for checking text connectivity, spurious tails and calculating the stroke thickness are proposed and carried out.","PeriodicalId":278806,"journal":{"name":"2014 6th International Conference on Computer Science and Information Technology (CSIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 6th International Conference on Computer Science and Information Technology (CSIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSIT.2014.6805978","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Thinning “Skeletonization” is a very crucial stage in the Arabic Character Recognition (ACR) system. It simplifies the text shape and reduces the amount of data that needs to be handled and it is usually used as a pre-processing stage for recognition and storage systems. The skeleton of Arabic text can be used for: baseline detection, character segmentation, and features extraction, and ultimately supporting the classification. In this paper, five of the state of the art thinning algorithms are selected and implemented. The five algorithms are: SPTA, Zhang-Suen parallel thinning algorithm, Huang-Wan-Liu thinning algorithm, thinning and skeletonization based morphological operation algorithms. The five selected algorithms are applied on the IFN/ENIT dataset. The results obtained by the five methods are discussed and analyzed against the IFN/ENIT dataset based on preserving shape and the text connectivity, preventing spurious tails, maintaining one pixel width skeleton and avoiding the necking problem as well as running time efficiently. In addition to that some performance measurement for checking text connectivity, spurious tails and calculating the stroke thickness are proposed and carried out.

查看原文本刊更多论文

骨骼提取:五种方法在阿拉伯语IFN/ENIT数据库上的比较

细化“骨架化”是阿拉伯文字符识别(ACR)系统中一个非常关键的阶段。它简化了文本形状，减少了需要处理的数据量，通常用作识别和存储系统的预处理阶段。阿拉伯语文本的骨架可用于:基线检测、字符分割、特征提取，并最终支持分类。本文选择并实现了目前最先进的五种稀疏算法。这五种算法分别是:SPTA、Zhang-Suen并行稀疏算法、Huang-Wan-Liu稀疏算法、基于稀疏和骨架化的形态学运算算法。将选择的五种算法应用于IFN/ENIT数据集。针对IFN/ENIT数据集，从保持形状和文本连通性、防止伪尾、保持1像素宽度骨架、有效避免颈缩问题和有效缩短运行时间等方面对五种方法的结果进行了讨论和分析。此外，还提出并实现了文本连通性检查、伪尾检查和笔画厚度计算等性能测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 6th International Conference on Computer Science and Information Technology (CSIT)

自引率

0.00%

发文量