Abderrazak Zahour, B. Taconet, P. Mercy, Said Ramdane
{"title":"阿拉伯语手写文本行提取","authors":"Abderrazak Zahour, B. Taconet, P. Mercy, Said Ramdane","doi":"10.1109/ICDAR.2001.953799","DOIUrl":null,"url":null,"abstract":"This paper describes a text-line extraction based method. The typical segmentation for a printed binary document is based on the horizontal projection analysis and then the regrouping of the connected components. These techniques can't be used for handwritten unconstrained text because data frequently contain undulations and shifts in the baseline, baseline-skew variability and inter-line distance variability. So, we think that the border line for a handwritten unconstrained documents should be a collection of horizontal line segments. From this point of view, we use a partial contour following based method to detect the separating lines. In the current version of our algorithm, we proceed to text slant detection, text line number evaluation by using partial projection. Then we carry out a partial contour following of every line; first in the direction of the writing, then in the opposite direction. After the treatment, the adjacent lines are separated. In the experimental session, we describe the application of the algorithm used for the extraction of text line. Database images contains about one hundred handwritten Arabic texts written by different writers. Results about diacritical points affectation are also reported.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"127","resultStr":"{\"title\":\"Arabic hand-written text-line extraction\",\"authors\":\"Abderrazak Zahour, B. Taconet, P. Mercy, Said Ramdane\",\"doi\":\"10.1109/ICDAR.2001.953799\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a text-line extraction based method. The typical segmentation for a printed binary document is based on the horizontal projection analysis and then the regrouping of the connected components. These techniques can't be used for handwritten unconstrained text because data frequently contain undulations and shifts in the baseline, baseline-skew variability and inter-line distance variability. So, we think that the border line for a handwritten unconstrained documents should be a collection of horizontal line segments. From this point of view, we use a partial contour following based method to detect the separating lines. In the current version of our algorithm, we proceed to text slant detection, text line number evaluation by using partial projection. Then we carry out a partial contour following of every line; first in the direction of the writing, then in the opposite direction. After the treatment, the adjacent lines are separated. In the experimental session, we describe the application of the algorithm used for the extraction of text line. Database images contains about one hundred handwritten Arabic texts written by different writers. Results about diacritical points affectation are also reported.\",\"PeriodicalId\":277816,\"journal\":{\"name\":\"Proceedings of Sixth International Conference on Document Analysis and Recognition\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"127\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of Sixth International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2001.953799\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2001.953799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper describes a text-line extraction based method. The typical segmentation for a printed binary document is based on the horizontal projection analysis and then the regrouping of the connected components. These techniques can't be used for handwritten unconstrained text because data frequently contain undulations and shifts in the baseline, baseline-skew variability and inter-line distance variability. So, we think that the border line for a handwritten unconstrained documents should be a collection of horizontal line segments. From this point of view, we use a partial contour following based method to detect the separating lines. In the current version of our algorithm, we proceed to text slant detection, text line number evaluation by using partial projection. Then we carry out a partial contour following of every line; first in the direction of the writing, then in the opposite direction. After the treatment, the adjacent lines are separated. In the experimental session, we describe the application of the algorithm used for the extraction of text line. Database images contains about one hundred handwritten Arabic texts written by different writers. Results about diacritical points affectation are also reported.