{"title":"一种新的乌尔都语文本分词技术","authors":"Atif Mahmood, Ankita Srivastava","doi":"10.1109/RAETCS.2018.8443958","DOIUrl":null,"url":null,"abstract":"Text segmentation is a process of subdividing the text image into its constituent parts, such as text lines, words and isolated characters. It is the first module in design of Optical character recognition systems. The problem of automatic text segmentation algorithms is increasingly becoming an important issue. Major problems arise due to the lack of standard dataset, a wide diversity of objectives and a lack of meaningful quantitative evaluation. In this paper a new technique is proposed that segments Urdu type written text into text lines on the basis of edges information of connected components. The performance of this technique is tested over the benchmark data set using precision and recall metric with accuracy of 87.36% and 84.75% respectively. Also data set collection, compilation and organization is a part of this research.","PeriodicalId":131311,"journal":{"name":"2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Novel Segmentation Technique for Urdu Type-Written Text\",\"authors\":\"Atif Mahmood, Ankita Srivastava\",\"doi\":\"10.1109/RAETCS.2018.8443958\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text segmentation is a process of subdividing the text image into its constituent parts, such as text lines, words and isolated characters. It is the first module in design of Optical character recognition systems. The problem of automatic text segmentation algorithms is increasingly becoming an important issue. Major problems arise due to the lack of standard dataset, a wide diversity of objectives and a lack of meaningful quantitative evaluation. In this paper a new technique is proposed that segments Urdu type written text into text lines on the basis of edges information of connected components. The performance of this technique is tested over the benchmark data set using precision and recall metric with accuracy of 87.36% and 84.75% respectively. Also data set collection, compilation and organization is a part of this research.\",\"PeriodicalId\":131311,\"journal\":{\"name\":\"2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAETCS.2018.8443958\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Recent Advances on Engineering, Technology and Computational Sciences (RAETCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAETCS.2018.8443958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel Segmentation Technique for Urdu Type-Written Text
Text segmentation is a process of subdividing the text image into its constituent parts, such as text lines, words and isolated characters. It is the first module in design of Optical character recognition systems. The problem of automatic text segmentation algorithms is increasingly becoming an important issue. Major problems arise due to the lack of standard dataset, a wide diversity of objectives and a lack of meaningful quantitative evaluation. In this paper a new technique is proposed that segments Urdu type written text into text lines on the basis of edges information of connected components. The performance of this technique is tested over the benchmark data set using precision and recall metric with accuracy of 87.36% and 84.75% respectively. Also data set collection, compilation and organization is a part of this research.