Fariza Meziani, Lallouani Bouchakour, Khadidja Ghribi, Mustapha Yahiaoui, H. Latrache, Mourad Abbas
{"title":"Arabic Handwritten Text to Line Segmentation","authors":"Fariza Meziani, Lallouani Bouchakour, Khadidja Ghribi, Mustapha Yahiaoui, H. Latrache, Mourad Abbas","doi":"10.1109/ICISAT54145.2021.9678458","DOIUrl":null,"url":null,"abstract":"Text to line segmentation is a crucial phase in character recognition system since segmentation errors affects the recognition accuracy. In this work we present a novel and simple method for Arabic handwritten text images segmentation into text-lines. After converting the gray scale images to binary ones, we combine in this proposed method three approaches based on horizontal projection profile (HPP), on connected components (CC) and on skeleton. Firstly, we apply the smoothed horizontal projection profile to detect approximately the beginning and the end of each line. Then, we identify the connected components in each line basing on computing their centroids in order to cluster them to form an individual text-line. Finally, in case there are vertically touching characters, we use the skeleton to separate them by calculating its intersection point. The experiments are performed with 100 text images from the database Khatt. This approach is evaluated by the MatchScore criterion. The obtained results prove the efficiency of our method.","PeriodicalId":112478,"journal":{"name":"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Information Systems and Advanced Technologies (ICISAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICISAT54145.2021.9678458","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Text to line segmentation is a crucial phase in character recognition system since segmentation errors affects the recognition accuracy. In this work we present a novel and simple method for Arabic handwritten text images segmentation into text-lines. After converting the gray scale images to binary ones, we combine in this proposed method three approaches based on horizontal projection profile (HPP), on connected components (CC) and on skeleton. Firstly, we apply the smoothed horizontal projection profile to detect approximately the beginning and the end of each line. Then, we identify the connected components in each line basing on computing their centroids in order to cluster them to form an individual text-line. Finally, in case there are vertically touching characters, we use the skeleton to separate them by calculating its intersection point. The experiments are performed with 100 text images from the database Khatt. This approach is evaluated by the MatchScore criterion. The obtained results prove the efficiency of our method.