{"title":"基于语言独立规则的印刷和手写文本分类","authors":"T. Saba, A. Almazyad, A. Rehman","doi":"10.1109/EAIS.2015.7368806","DOIUrl":null,"url":null,"abstract":"Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.","PeriodicalId":325875,"journal":{"name":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Language independent rule based classification of printed & handwritten text\",\"authors\":\"T. Saba, A. Almazyad, A. Rehman\",\"doi\":\"10.1109/EAIS.2015.7368806\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.\",\"PeriodicalId\":325875,\"journal\":{\"name\":\"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EAIS.2015.7368806\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EAIS.2015.7368806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Language independent rule based classification of printed & handwritten text
Handwriting in data entry forms/documents usually indicates user's filled information that should be treated differently from the printed text. In Arab world, these filled information are normally in English or Arabic. Secondly, classification approaches are quite different for machine printed and script. Therefore, prior to segmentation & classification, text distinction into Printed & script entries is mandatory. In this research, the dilemma of the language independent text distinction in multilingual data entry forms is addressed. Our main focus is to distinguish the machine printed text and script in multilingual data entry forms that are language independent. The proposed approach explore new statistical and structural features of text lines to classify them into separate categories. Accordingly a set of classification rules is derived to explicitly differentiate machine printed and handwritten entries, written in any language. Additional, novelty of the proposed approach is that no training/training data is required rather text is discriminated on basis of simple rules. Promising experimental results with 90 % accuracy exhibit that proposed approach is simple and robust. Finally, the scheme is independent of language, style, size, and fonts that commonly co-exist in multilingual data entry forms.